I'll be honest, I've never once used COOKIEJAR or COOKIEFILE, I've always just set the cookie header. You could try reading the return headers and setting the cookie header manually. here's an example of sending my session cookie back to my website.

$ch = curl_init('https://derokorian.com');
curl_setopt_array($ch, [
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_SSL_VERIFYPEER => false,

// We need to get headers back in our response
  CURLOPT_HEADER => true,
]);
$result = curl_exec($ch);

// Check that the request was successful
if ($result === false) {
  die(sprintf("cURL failed (%d): %s\n", curl_errno($ch), curl_error($ch)));
}

// Get the session cookie out of the response
if (preg_match_all("/^Set-Cookie: DFW_SESSION=([^;]*)/mi", $result, $matches)) {
  $sessId = $matches[1][0];
} else {
  die("No session cookies found in response\n");
}

// Make a new request, this time sending in the session cookie
$ch = curl_init('https://derokorian.com');
curl_setopt_array($ch, [
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_SSL_VERIFYPEER => false,

// Get headers again, to prove out that the cookie was successfully sent
  CURLOPT_HEADER => true,

// You can either use CURLOPT_COOKIE like this or...
  CURLOPT_COOKIE => 'DFW_SESSION='.$sessId

// You can use CURLOPT_HTTPHEADER like this
  CURLOPT_HTTPHEADER => [
    'Cookie: DFW_SESSION='.$sessId
  ]
]);
$result = curl_exec($ch);

// Look at the result, as you can now see, there is no longer a set-cookie in the response, because the correct session was already sent in
var_dump($result);

    Derokorian, Thank you so much for pointing those things out. To answer your questions, I'm sure "$token" really contains the token because when I end the script half way through with "print $token;" my ssh window will show the randomly generated $token number. (So at least I know that "$data" and "$token" are correctly populated).

    I'd like to try your cookie method and then come back here to let you know the results -- but I'm sorry I'm a little confused: the part in your code where you have the preg_match and "DFW_SESSION" like this:

    // Get the session cookie out of the response 
    if (preg_match_all("/^Set-Cookie: DFW_SESSION=([^;]*)/mi", $result, $matches)) { 
      $sessId = $matches[1][0]; 
    } else { 
      die("No session cookies found in response\n"); 
    }
    

    How did you know what to preg_match for? And what is "DFW-Session?" Because when I run my code and it contacts the webserver, I have no idea what the names of their session variables are...

    I'm sorry, I have the feeling I've asked a dumb question, but would you mind please clarifying this for me? If you can clarify it, I'll go ahead and re-write my code, try it out, and post the results here.

    Thank you, Derokorian.

      codinghelper;11061733 wrote:

      How did you know what to preg_match for? And what is "DFW-Session?" Because when I run my code and it contacts the webserver, I have no idea what the names of their session variables are...

      I'm pretty sure that, since it's a regexp, it's specifically written for his application, and that his cookie contained that string. Yours would be different if you chose to use this approach.

      I will say that I've read hundreds of thousands of pages from the WWW using PHP, cURL and the cURL COOKIEJAR.

      Out of curiosity, in this case, what's your target system? Is it running an ASP server by any chance?

        Dalescop, the server being used is "PicLan-IP 2.0.0 (build 175)."

        I've confirmed that "cookies" are not being used because no "cookies.txt" file gets placed on my desktop (when I visit other sites that DO use cookies, a file called "cookies.txt" gets placed on my desktop).

        This is interesting: in the middle of my script when I've preg_matched the $token, I added "print $token;" and then at the end of my script I have var_dump($data) to see the entire sourcecode of the page I've CURLed. The "token" from the "print $token" is DIFFERENT than the $token in the var_dump!

        I cannot figure out what else this page wants! It shouldn't be rocket science. An ordinary web browser simply posts the variables a webserver needs, in this case:
        1.) 'controlNumber'=>$token,
        2.) 'inputname1'=>"hammer",
        -and-
        3.) 'inputname2'=>'nail'

        Interestingly, the URL has my username and password encoded into it which means it's a "get" request.... but since http://www.website.com/index.php?username=myself&password=password works in an ordinary web browser, it should work fine when I CURL init that same URL.

        I can't figure out what else it needs, unless:
        a.) My script is wrong
        b.) My script is in the wrong order
        c.) I have something wrongly duplicated (like, for example, I have "$data = curl_exec($ch);" listed twice.... but maybe that's okay, I don't know)

        What are your thoughts, please?

        Here's the script as I have it now:

        <?php  
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL,'http://www.website.com/index.php?username=myself&password=password');
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
        curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (compatible, MSIE 10, Windows NT 6.2'); ### THIS GETS THE RANDOM NUMBER "TOKEN": $data = curl_exec($ch); preg_match('/controlNumber\"\svalue=\"(.*)\"/',$data,$clip);
        $token = $clip[1]; ### Now that we have the "$token", lets put it in the CURL post: curl_setopt($ch, CURLOPT_POSTFIELDS, array('controlNumber'=>$token,'inputname1'=>"hammer",'inputname2'=>'nail'));
        $data = curl_exec($ch); ### This var_dump should have the result I am seeking ### Unfortunately, the var_dump only shows the same thing as the but instead it does not ### The following var_dump should have the result I am seeking. ### Unfortunately, it does not. It only shows the same ### original website as if I've simply refreshed the page ### as if nothing got POSTed. var_dump($data);
        ?>

          In your latest example, you've not set CURLOPT_POST. cURL's default method is GET, as I'm sure you're aware.

          Other thoughts I've had are "why are credentials in the GET string, if if they're supposed to be there for the initial load, should they really be there the 2nd time?"

          I suppose there could easily be reasons for that.

          As for the token changing ... does any JavaScript in the browser adjust the token before it's sent to the server for the 2nd request?

            Could it be that your useragent is malformed? You have an opening parenthesis but not a closing. Looks incomplete to me.

              codinghelper;11061733 wrote:

              How did you know what to preg_match for? And what is "DFW-Session?" Because when I run my code and it contacts the webserver, I have no idea what the names of their session variables are...

              dalecosp;11061753 wrote:

              I'm pretty sure that, since it's a regexp, it's specifically written for his application, and that his cookie contained that string. Yours would be different if you chose to use this approach.

              Yes, I was looking for my specific session name, however, you could just drop the DFW-Session part, and pull all cookies out using the rest of the regexp. Since I know the one cookie I need from my site, that's all I looked for, but if you didn't know you could catch them all, and just concatenate them together in the new request.

                unbelievable... I finally got it to work.... ALMOST! Here's the "almost working" code:

                <?php  
                $ch = curl_init();
                curl_setopt($ch, CURLOPT_URL,'http://www.website.com/index.html?user=username&pass=password'); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
                curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/601.7.0 (KHTML, like Gecko) Version/9.0.1 Safari/537.81.8'); $result = curl_exec($ch); preg_match('/ControlNumber\"\svalue=\"(.*)\"/',$result,$number);
                $token = $number[1]; curl_setopt($ch, CURLOPT_POSTFIELDS, array('controlNumber'=>$token,'inputname1'=>"hammer",'inputname2'=>'nail')); $Desired_Result = curl_exec($ch); var_dump($Desired_Result); ?>

                PROBLEM: Using a normal web browser yields a server response with TWO pieces of data. However, using this CURL script above yields a server response with only ONE piece of data.

                Further, I discovered that in my web browser's "developer tool Network Tab," I can click "Copy as CURL" as well as "Copy Response."

                The "Copy Response" has ALL the data I want, absolutely perfect! Yet when I "Copy as CURL" and paste it in my SSH terminal, I only get a server response with just ONE bit of data...

                In fact, the "Copy as CURL" is real complicated, it has a zillion headers, gzip, deflate, user_agent, everything.... I would have thought this perfectly would mimic a web browser and thus provide me with a server response with all the data I require. Yet it is different.

                Why?

                  One possible reason for the difference may be that the site recognised the CURL submission as a duplicate and so treated it differently from the original one.

                  I'm starting to wonder if the site has been engineered to prevent automated access....

                    Weedpacket;11061823 wrote:

                    One possible reason for the difference may be that the site recognised the CURL submission as a duplicate and so treated it differently from the original one.

                    I'm starting to wonder if the site has been engineered to prevent automated access....

                    http://www.commitstrip.com/en/2015/05/19/data-wars/

                    It kind of weird ... one job, and I've spent time in both rooms shown above 😉

                      Derokorian;11061829 wrote:

                      Ha, I've been in both - but in the first room I convinced the powers that be, that using an approved API was easier.

                      We couldn't determine that enough of the targets offered any such service; most of them are lucky to even have a website that includes the objects they want to sell, it appeared (at least in 2012). I have seen some of them with such things, and we've adapted in some cases, but fundamentally the people who have the capability to offer an API or similar feed are less than fully interested in actually having us consume it ...

                      #cache22 ... (typed that by accident ... leaving it, somehow it seems apropros .... 😃 )

                        codinghelper;11061821 wrote:

                        unbelievable... I finally got it to work.... ALMOST! Here's the "almost working" code:

                        <?php  
                        $ch = curl_init();
                        curl_setopt($ch, CURLOPT_URL,'http://www.website.com/index.html?user=username&pass=password'); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
                        curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/601.7.0 (KHTML, like Gecko) Version/9.0.1 Safari/537.81.8'); $result = curl_exec($ch); preg_match('/ControlNumber\"\svalue=\"(.*)\"/',$result,$number);
                        $token = $number[1]; curl_setopt($ch, CURLOPT_POSTFIELDS, array('controlNumber'=>$token,'inputname1'=>"hammer",'inputname2'=>'nail')); $Desired_Result = curl_exec($ch); var_dump($Desired_Result); ?>

                        PROBLEM: Using a normal web browser yields a server response with TWO pieces of data. However, using this CURL script above yields a server response with only ONE piece of data.

                        (SNIP)

                        Why?

                        This is the point at which I'd do something like

                        print_r(curl_getinfo($ch));

                        Sometimes the headers and stuff will give you a clue....

                          It is my guess that the token you are trying to read from the page on the first request is intended to prevent CSRF attempts. That being the case the remote website probably has some way to propagate this token other than the POST operation you are attempting in your second request so that the POST handling code can check the POSTed token against the other channel. The other channel through which the token is propagated is probably either a) cookies (remote website sends you cookies to store with your first page request) or b) session (remote website is still sending you cookies, but it stores the controlNumber token somewhere server side and uses your session id to look up your session info. In either case, you need for both requests sent to the remote server to deal with cookies using the same cookie file both times.

                          In your code here, I see you referring to COOKIE_FILE but I don't see this constant defined anywhere. I also see that your code is pretty inconsistent as though you are casting about somewhat blindly. For example, you stop setting this after post #7 above:

                          curl_setopt($ch, CURLOPT_POST, true); 

                          I could be wrong but I think there's no point in setting CURLOPT_POSTFIELDS unless you set CURLOPT_POST to true?

                          Try and be more methodical. Have your script either echo its progress every now or maybe write a log file so you can see what it's up to. Some suggestions:

                          1) make the first request and echo the token to make sure you have some valid-looking token
                          2) make sure the first request uses the COOKIEJAR and COOKIEFILE settings

                          ###COOKIE STUFF HERE:
                          $my_cookie_file = "/tmp/my-cookie-file"; // BEWARE -- if you are running this script a lot, you might want to make sure each time the script runs it has a different name to avoid collisions.
                          // USE THE EXACT SAME COOKIE FILE FOR BOTH SETTINGS:
                          curl_setopt($ch, CURLOPT_COOKIEJAR, $my_cookie_file);
                          curl_setopt($ch, CURLOPT_COOKIEFILE, $my_cookie_file); 
                          

                          3) consider inspecting the contents of $my_cookie_file after each run to see if any cookie info is actually getting written in there
                          4) for the second request, make sure you set both CURLOPT_POST and CURLOPT_POSTFIELDS and also make sure you use the exact same cookiejar/cookiefile settings as the first request

                            7 days later
                            sneakyimp;11061867 wrote:

                            It is my guess that the token you are trying to read....

                            Sneakyimp, thank you for your reply. Here are some further anomalies / clarifications:

                            1.) I used the curlopt cookies stuff, verified it works by testing on another site that uses cookies. But the site I'm trying to access doesn't use cookies.

                            2.) Site doesn't need javascript either (I set my browser javascript and browser cookies to "OFF").

                            Note: the desired result from the site I'm trying to access is 1.) item description, and 2.) item quantity.

                            3.) When I run my script, I actually DO get the desired "item description" (mentioned in my note, above). However, I do NOT get the "item quantity." This demonstrates that my script correctly extracts the "$token" for 2 reasons: a.) If my script fails to extract the correct token, I get the website error page, and b.) when I write "print $token" halfway through the script, it will print a valid $token.

                            4.) Frustratingly, again, when I visit the website in Google Chrome, it produces the desired result (i.e. both "item description" and "item quantity"). But, when I use Google Chrome's Network Tool called "Copy as CURL" and then run that exact code as a script in my SSH terminal, I ONLY get the "item description" and not the "item quantity" thus indicating two different website results for the same script!

                            Please note: when I use Google Chrome's Network Tool called "Copy as CURL," I do of course modify that code to correctly extract and substitute the "$token" variable that gets created with each visit.

                            So I don't know where to go from here. Web browser displays page with Item Description and Item Quantity, but CURL code just displays Item Description.

                              codinghelper wrote:

                              3.) When I run my script, I actually DO get the desired "item description" (mentioned in my note, above). However, I do NOT get the "item quantity." This demonstrates that my script correctly extracts the "$token" for 2 reasons: a.) If my script fails to extract the correct token, I get the website error page, and b.) when I write "print $token" halfway through the script, it will print a valid $token.

                              At this point I would have compared the complete responses to see how and where they differ; then I'd know what the differences between them look like and not just that they exist.

                                Weedpacket;11062043 wrote:

                                At this point I would have compared the complete responses to see how and where they differ; then I'd know what the differences between them look like and not just that they exist.

                                Hello, thank you for your reply. I did compare the different responses. The response from a web browser (Chrome, Firefox, and Safari) all contain a table (e.g. "<TR><TD>...desired information here</TD></TR>"), but the response from my direct CURL script does not contain that table.

                                Again, I am so frustrated, I'm going on 3 weeks of this (although I admit I'm enjoying the challenge)... I am so confused why when I use Google Chrome, the neat network tool practically HANDS me the CURL script, with all the fancy headers (the gzip deflate, the keepalive, all that stuff I'd never think to include), but when I run the provided CURL script, it still gives me the incomplete result (the result missing the aformentioned table)...

                                But thank you for answering. If you have any other ideas, I'll keep giving it my best shot until I learn.

                                  Write a Reply...