Ok, I am trying to gather information from a website where most of my country´s companies information is located.

They have that service available for anyone, you can just search for any company using the name, and it will give you a first listing using HTML to present it in a nice format, and then it allows you to "save" the information, which pretty much is a data dump from the database delimitating the fields with a |.

So, what I want to do is add a registration script at my site so that whenever a company registers there, their information gets fetched from the site I mentioned above in order to "auto fill" this person´s company information.

So I started to investigate about which options I had in order to use PHP to do this, first I found out that I could "open" the URL using fopen, which worked decently fine at the begining (until I reached the first HTML formatted results page), however when I tried to use the same method to access the data dump with the nice "importable" format which would make my whole process extremely easy I realized that using fopen wouldnt help me since I needed to POST some information (mainly the SQL query).

So I came up to start working with CURL, which, after a whole day of research, several failures and a couple of succeses seems to be the best solution so far.

Problem is.. that for some unknown reason for me, whenever I use the code I wrote for this, I get an error page from the server which holds the companies data IF I use a company name shorter than 5 characters... I dont get this error and I can list everything Ok if I use more than 5 characters.... funny thing is that If you use the form directly from their site you can use whatever length company names you want and you can still list the results...

So, I hope that if you are reading this far I havent confuse you too much :p... (btw, english is not my mother tongue)

The site I want to use to retrieve the companies information is: http://www.siem.gob.mx/portalsiem
The direct URL for the FORM I want to use is:
http://www.siem.gob.mx/portalsiem/padron/consulta.asp?q=1&gpo=1
You can type the name of a company (for example "microsoft") and press enter, and you will see the first HTML formatted results page. The other page I want to use is the one you get when you click the "guardar" ("save") button.

The code I have came up to so far is the following:

<?php
if( isset($HTTP_POST_VARS['first']) && strlen($HTTP_POST_VARS['first'])>1 )
{
	$curlPost = "txtsql=+*+From+INDICE+%28Index%3DGeneral%29++Where++%28DG_Razonsoc+Like+%27%25" . $HTTP_POST_VARS['first'] . "%25%27+OR+DG_RazonComercial+Like+%27%25" . $HTTP_POST_VARS['first'] . "%25%27%29&E=1"; 
	$ch = curl_init(); 
	curl_setopt($ch, CURLOPT_URL, 'http://www.siem.gob.mx/portalsiem/comunes/Guarda.asp?Id=16'); 
	curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
	curl_setopt($ch, CURLOPT_HEADER, 1); 
	curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
	curl_setopt($ch, CURLOPT_POST, 1); 
	curl_setopt($ch, CURLOPT_POSTFIELDS, $curlPost); 
	$data = curl_exec($ch); 
	print_r($data);
	curl_close($ch); 
	ask($HTTP_POST_VARS['first']);
}
else
{
		ask("");
}
function ask($init)
{
	?>
	<FORM ACTION="urlnew.php" METHOD="POST">
	Razon Social: <INPUT TYPE="text" NAME="first" VALUE="<?php echo $init; ?>">
	<BR>
	<INPUT TYPE="SUBMIT">
	</FORM>
	<?php
}
?> 

If you try that code using "Microsoft" it does the same as the webpage form, however if you try "mic" you get this error page with my script:

The page cannot be displayed 
There is a problem with the page you are trying to reach and it cannot be displayed. 

--------------------------------------------------------------------------------

Please try the following:

Click the Refresh button, or try again later.

Open the localhost home page, and then look for links to the information you want. 
HTTP 500.100 - Internal Server Error - ASP error
Internet Information Services

--------------------------------------------------------------------------------

Technical Information (for support personnel)

Error Type:
Microsoft VBScript runtime (0x800A01A8)
Object required: 'Session(...)'
/portalsiem/comunes/Guardab.asp, line 94


Browser Type:
Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0) 

Page:
GET /portalsiem/comunes/Guardab.asp 

Time:
Monday, September 05, 2005, 6:51:39 PM 


More information:
Microsoft Support 

When if you try the same query at the website´s form you do get the whole list...

I have too many questions and even more "tutorials, threads, examples" to follow.. I must admit that my experience using CURL is as short as one day, but I cant find any more "relevant" information about MY question/problem.

The error page mentions something about a missing "session" object or something... perhaps I need a cookie? how can I simulate cookies using the script from my server? (it would be my server the one querying the website, not my browser)

I hope someone can help me a bit with this.

Thanks a lot.

BTW, if you are wondering why I want to obtain my country´s companies information list, I am part of a big student run international organization called AIESEC, and we work with companies all the time trying to get traineeships for foreign students.

    Write a Reply...