I've been having a very frustrating problem with one of my CURL scripts lately. It's supposed to login to one of my affiliate programs and download the stats for me. The script was working fine until DirectTrack made some changes and went to HTTPS, now it's only partially working. Basically, I can get the script to login and navigate around some pages, but when it comes to downloading the actual stats file (which is a simple GET request), the whole process just hangs, and eventually returns a blank page.

Please let me know what other information I can give to help you. Here is my script:

$email_address = urlencode("xxx@xxxxxxxxx.com");
$password = "xxxxxxx";
$cookie_file_path = "cookie"; // cookie file (dont bother changing)

// 1 - Get the Cookies required to login from the welcome login page

    $LOGINURL = "http://www.xxxxxx.com/";
$agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)";
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL,$LOGINURL);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
$result = curl_exec ($ch);
curl_close ($ch);

// 2 - Post Login Cookies and Login Information to Page

$LOGINURL = "https://login.xxxxxx.com/login.html";
$reffer = "http://www.xxxxxx.com";

$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL,$LOGINURL);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_POST, 1); 
curl_setopt($ch, CURLOPT_POSTFIELDS, "DL_AUTH_USERNAME=$email_address&DL_AUTH_PASSWORD=$password"); // add POST fields
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_REFERER, $reffer);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
$result = curl_exec ($ch);
    curl_close ($ch);

Ok, the script works fine up to this point. When I print these results, I get the page displayed just as I should. However, when I go to download the stats file using the following code, it hangs:

// 4 - go to stats page

$LOGINURL = "https://login.xxxxxx.com/publishers/monthly_affiliate_stats.html?program_id=0&affiliate_stats_start_month=08&affiliate_stats_start_day=01...";
$reffer = "https://login.xxxxxx.com/partners/";

$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL,$LOGINURL);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_REFERER, $reffer);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
$result = curl_exec ($ch);
    curl_close ($ch);

print $result;

I've verified that I can login using Firefox and just copy and paste the above URL and the stats file downloads fine. I really don't know what to do here.

I should add, curl_error gives me this: "Connect failed; Operation now in progress", after about 5 minutes of waiting for the page to load.

Thanks!

    I looked at your code and it's exactly the same as code I'm using to get an HTTPS page so I think you are doing it right. Since you are desperate, have you tried posting instead?

    $postfields = "program_id=0&affiliate_stats_start_month=08&affiliate_stats_start_day=01..."
    curl_setopt ($ch, CURLOPT_POST, 1);
    curl_setopt ($ch, CURLOPT_POSTFIELDS, $postfields);

    Have you tried fopen for this one particular file? (Maybe you're not doing that because you need cookies, not sure).

      I tried to post the info and it didn't work. I even took another step back and tried to retrieve the stats form page: http://login.xxxxxx.com/partners/monthly_affiliate_stats.html, and I got the same error!

      When I make the request at the command line and grab the headers, I get this:

      HTTP/1.1 301 Moved Permanently
      Date: Thu, 17 Aug 2006 06:43:33 GMT
      Server:
      Location: http://login.xxxxxx.com/partners/monthly_affiliate_stats.html?pro...ownload
      Content-Type: text/html; charset=iso-8859-1

      Very odd..

        Actually, it's not odd. It's a 301 redirect. It's a technique used by companies who want to prevent people from using CURL to obtain data from their web site. The URL that follows the "Location:" should be the new location of the page. To screw you up, they often have a script that changes the URL once an hour so that your CURL script will keep breaking.

        I'm not certain that that's what's going on here but I've seen this before with other major web sites that want to force their users to use web browsers.

        The header data you got back when you tried from the command line is designed to bump your browser over to another URL to try again. Try using that URL in your CURL script.

        Sometimes they bump you from one URL to another to make it even harder for you (or CURL) to figure out what is going on. In a worst case scenario, you might have to have your script examine the headers, and follow the new URL when you see a 301 (or a 302) redirect.

          Huh, well I just thought it was odd because I was under the impression that I had set CURL to follow redirects, so assumed it would continue on from the 301.

          I've tried the URL it has under location, which is the same URL with http: instead of https:, and it fails again but with no headers.

          I see what you're saying, but it appears that the URL under Location never changes.

          Thanks for the help. I'm going to play around with it a bit more.

            Write a Reply...