file_get_contents getting 500 Error

ccd4599 · Mar 12, 2010

Trying to create a script to grab all the threads from a forum and put them in a db to search through later. Everything seems to work except that I get a 500 internal server error after 5-10 minutes.

      set_time_limit(0);

   mysql_query("Truncate table forumindex");

    //#of threads
for($x=1; $x<40000; $x++)
{	
        //multiple forums
	for($i=0; $i<4; $i++)
	{	
	        //initial thought was a timeout issue, so I tried to solve that.
		$ctx = stream_context_create(array(
		    ‘http’ => array(
		        ‘timeout’ => 3
		        )
		    )
		);
		$handle = file_get_contents("http://*********.com/boardthread?id=" . $forum[$i] . "&thread=" . $x, null, $ctx);


		if($handle)
		{
			//code  to add to db. which works.				
		}
	}		
}

ixalmida · Mar 12, 2010

Can you turn on error_reporting or check your server logs? 500 error usually indicates a syntax error and would display as a fatal error.

First thing I usually check are brackets, parens and semi-colons. I didn't see a problem with those from the snippet you listed, but then again, the error could be anywhere on the page.

ccd4599 · Mar 12, 2010

I did turn on error reporting and am rerunning the script now.

I should mention that if I limit the script to only 100 threads it executes perfectly.

Also even if I comment out all the code except for what I've posted here I get the same problem.

ccd4599 · Mar 12, 2010

No error from error reporting.

My server log has the following:
[Fri Mar 12 10:18:21 2010] [error] [client ...] File does not exist: /home/**/public_html/404.shtml
[Fri Mar 12 10:18:21 2010] [error] [client ...] File does not exist: /home/****/public_html/favicon.ico

ixalmida · Mar 12, 2010

Here's a problem:

file_get_contents("http://*********.com/boardthread?id="...

I don't think you can include variables in the URL of your file. file_get_contents is looking for the location of the file, not the query that retrieves the file.

ccd4599 · Mar 12, 2010

I disagree. Like I said it works fine if I limit to 100 threads. It reads the content for each thread.

If I don't limit the threads it still updates the database until it hits the 500 error. Sometimes it will successfully read as many as 2000 threads before the server error.

ixalmida · Mar 12, 2010

Huh...that surprises me. Can you use fgets() to read the file line by line instead? You can either append to your string or read each line into an array and implode() it.

Weedpacket · Mar 12, 2010

A 500 error (any response code starting with '5') means that the server borked for some reason that it wasn't able to be any clearer about - not necessarily anything to do with what you did (although perhaps one cause could be the 160,000 consecutive requests you're making overwhelming its database connections).

ccd4599 · Mar 12, 2010

I'll try it later on. It will add a lot of time to an already lengthy script though.

ccd4599 · Mar 12, 2010

@weed

I was wondering about that as well. Essentially I'm reading 160000 webpages.

Even if I just make the script read the webpage and not bother adding it to the database it still spits out the 500 error.

bradgrafelman · Mar 15, 2010

Can you save the contents of the page as a file on your server when it receives the 500 error? It's sounding like you're hitting some type of rate (or resource) limit on the external site and the 5xx error is simply to throttle your queries (either due to policy or by necessity).

In that case, there's no real solution other than to slow down your query rate.

EDIT: Just to clearly explain my thought process... the point of saving the response to disk when you get the 5xx error is so that you can examine the data returned by the remote webserver - perhaps its telling you exactly what the problem is.

file_get_contents getting 500 Error

Cccd4599

Iixalmida

Cccd4599

Cccd4599

Iixalmida

Cccd4599

Iixalmida

Weedpacket

Cccd4599

Cccd4599

Bbradgrafelman