Ok, I am trying to gather information from a website where most of my country´s companies information is located.
They have that service available for anyone, you can just search for any company using the name, and it will give you a first listing using HTML to present it in a nice format, and then it allows you to "save" the information, which pretty much is a data dump from the database delimitating the fields with a |.
So, what I want to do is add a registration script at my site so that whenever a company registers there, their information gets fetched from the site I mentioned above in order to "auto fill" this person´s company information.
So I started to investigate about which options I had in order to use PHP to do this, first I found out that I could "open" the URL using fopen, which worked decently fine at the begining (until I reached the first HTML formatted results page), however when I tried to use the same method to access the data dump with the nice "importable" format which would make my whole process extremely easy I realized that using fopen wouldnt help me since I needed to POST some information (mainly the SQL query).
So I came up to start working with CURL, which, after a whole day of research, several failures and a couple of succeses seems to be the best solution so far.
Problem is.. that for some unknown reason for me, whenever I use the code I wrote for this, I get an error page from the server which holds the companies data IF I use a company name shorter than 5 characters... I dont get this error and I can list everything Ok if I use more than 5 characters.... funny thing is that If you use the form directly from their site you can use whatever length company names you want and you can still list the results...
So, I hope that if you are reading this far I havent confuse you too much :p... (btw, english is not my mother tongue)
The site I want to use to retrieve the companies information is: http://www.siem.gob.mx/portalsiem
The direct URL for the FORM I want to use is:
http://www.siem.gob.mx/portalsiem/padron/consulta.asp?q=1&gpo=1
You can type the name of a company (for example "microsoft") and press enter, and you will see the first HTML formatted results page. The other page I want to use is the one you get when you click the "guardar" ("save") button.
The code I have came up to so far is the following:
<?php
if( isset($HTTP_POST_VARS['first']) && strlen($HTTP_POST_VARS['first'])>1 )
{
$curlPost = "txtsql=+*+From+INDICE+%28Index%3DGeneral%29++Where++%28DG_Razonsoc+Like+%27%25" . $HTTP_POST_VARS['first'] . "%25%27+OR+DG_RazonComercial+Like+%27%25" . $HTTP_POST_VARS['first'] . "%25%27%29&E=1";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.siem.gob.mx/portalsiem/comunes/Guarda.asp?Id=16');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $curlPost);
$data = curl_exec($ch);
print_r($data);
curl_close($ch);
ask($HTTP_POST_VARS['first']);
}
else
{
ask("");
}
function ask($init)
{
?>
<FORM ACTION="urlnew.php" METHOD="POST">
Razon Social: <INPUT TYPE="text" NAME="first" VALUE="<?php echo $init; ?>">
<BR>
<INPUT TYPE="SUBMIT">
</FORM>
<?php
}
?>
If you try that code using "Microsoft" it does the same as the webpage form, however if you try "mic" you get this error page with my script:
The page cannot be displayed
There is a problem with the page you are trying to reach and it cannot be displayed.
--------------------------------------------------------------------------------
Please try the following:
Click the Refresh button, or try again later.
Open the localhost home page, and then look for links to the information you want.
HTTP 500.100 - Internal Server Error - ASP error
Internet Information Services
--------------------------------------------------------------------------------
Technical Information (for support personnel)
Error Type:
Microsoft VBScript runtime (0x800A01A8)
Object required: 'Session(...)'
/portalsiem/comunes/Guardab.asp, line 94
Browser Type:
Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
Page:
GET /portalsiem/comunes/Guardab.asp
Time:
Monday, September 05, 2005, 6:51:39 PM
More information:
Microsoft Support
When if you try the same query at the website´s form you do get the whole list...
I have too many questions and even more "tutorials, threads, examples" to follow.. I must admit that my experience using CURL is as short as one day, but I cant find any more "relevant" information about MY question/problem.
The error page mentions something about a missing "session" object or something... perhaps I need a cookie? how can I simulate cookies using the script from my server? (it would be my server the one querying the website, not my browser)
I hope someone can help me a bit with this.
Thanks a lot.
BTW, if you are wondering why I want to obtain my country´s companies information list, I am part of a big student run international organization called AIESEC, and we work with companies all the time trying to get traineeships for foreign students.