The following is my PHP class object that will scrape a remote URL and time its evaluation and download. I am sorry if this is long but I want to make it as clear as possible.

Here are my steps to doing the remote scrape using CURL:

1) I scrape the remote URL with CURLOPT_HEADER set to 1 to obtain the HTTP headers
2) I glean off the headers the value of PHPSESSID to get the session ID of the remote URL
3) I set that to an object property $this->PHPSESSID
4) I then re-scrape the remote URL (with CURLOPT_HEADER set to 0) while setting a cookie string in a method into $qs, making $qs a semicolon-delimited string of cookie key=val pairs with spaces in between the semicolons.
5) I set up another CURL resource object and then re-scrape the site.

And it fails! The remote URL is displayed completely wrong because the required cookies (including the session id) is completely gone from the remote HTTP headers; CURL never does Set-cookie in spite of the cookies being set absolutely correctly.

Here is the code, I'm at my wit's end:

class Timer extends View {

/**
 * @access private
 * @var mixed $cookieName
 */
var $cookieName;

/**
 * @access private
 * @var mixed $PHPSESSID
 */
var $PHPSESSID;

/**
 * @access private
 * @var float $startTime
 */
var $startTime;

/**
 * @access private
 * @var mixed $url
 */
var $url;

/**
 * Constructor.  Set optional URL property.  Set optional $cookieName property either through parameter or via $_REQUEST autoglobal with name of 'cookieName'
 *
 * @access public
 * @param mixed $url (optional)
 * @param mixed $cookieName (optional)
 */
function Timer($url = '', $cookieName = '') {				// CONSTRUCTOR
	$this->url = $url;
	if ($cookieName) $this->cookieName = $cookieName;
	if ($_REQUEST['cookieName']) $this->cookieName = $_REQUEST['cookieName'];
}

//-------------------------------------------- --* GETTER/SETTER METHODS *-- ------------------------------------------

/**
 * Retrieve $PHPSESSID
 *
 * @access private
 * @return mixed $PHPSESSID
 */
function &getRemoteSessionID() {		// STATIC STRING METHOD
	return $this->PHPSESSID;
}

/**
 * Retrieve $url property
 *
 * @access private
 * @return mixed $url
 */
function &getURL() {					// STATIC STRING METHOD
 	return $this->url;
}

/**
 * Set the cookie in $_COOKIE into the curl reference
 *
 * @access private
 * @param resource $ch (reference) curl reference
 */
function &setCookieCurlSetOpt(&$ch) {		// STATIC VOID METHOD
	/*----------------------------------------------------------------------------------------------------------------------------
		Remember that unlike $_GET or $_POST requests which use '&' to "glue" all key/val pairs together,
		$_COOKIE requires a semicolon instead to use as its glue to adjoin all cookie key/val pairs together
	-----------------------------------------------------------------------------------------------------------------------------*/
	if ($this->cookieName) {
	 $qs = ';' . $this->cookieName . '=';
	 if (is_array($_COOKIE[$this->cookieName]) || is_object($_COOKIE[$this->cookieName])) {
	  $qs .= serialize($_COOKIE[$this->cookieName]);
	 } else {
	  $qs .= $_COOKIE[$this->cookieName];
	 }
	} elseif (@sizeof(array_values($_COOKIE)) > 0) {
	 foreach ($_COOKIE as $key => $val) if (is_array($val) || is_object($val)) $qs .= "; $key=" . serialize($val); else $qs .= "; $key=$val";
	}
	$PHPSESSID = $this->getRemoteSessionID();
	if ($qs && $PHPSESSID) $qs .= "; PHPSESSID=$PHPSESSID";
	if ($qs) curl_setopt($ch, CURLOPT_COOKIE, trim(substr($qs, 1, strlen($qs))));
	print_r("qs = " . trim(substr($qs, 1, strlen($qs))) . "<P>");
}

/**
 * Set the $_POST curl set_opt options per instance of $_POST
 * 
 * @access private
 * @param resource $ch (reference) curl reference
 */
function &setPOSTCurlSetOpt(&$ch) {			// STATIC VOID METHOD
	if (@sizeof(array_values($_POST)) > 0) {
	 curl_setopt($ch, CURLOPT_POST, 1);
	 foreach ($_POST as $key => $val) $qs .= "&$key=" . urlencode(serialize($val));
	 curl_setopt($ch, CURLOPT_POSTFIELDS, substr($qs, 1, strlen($qs)));
	}
}

/**
 * Set $this->PHPSESSID with a "pop" connection to the remote site
 *
 * @access private
 */
function &setRemoteSessionID() {		// STATIC STRING METHOD
	if (is_object($this) && !$this->getURL() && $url) $this->setURL($url);
	if (is_object($this)) $url = $this->getURL();
	if ($url && ini_get('allow_url_fopen')) {
	 $url = $this->configureURL($url);
	 // grab URL and pass it to the browser
	 $ch = curl_init($url);
	 curl_setopt($ch, CURLOPT_HEADER, 1);
	 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	 curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0);
	 $header .= curl_exec($ch);
	 curl_close($ch);
	 preg_match('/PHPSESSID=([a-zA-Z0-9]+);/i', $header, $matchArray);
	 $this->PHPSESSID = $matchArray[1];
	}
}		


/**
 * Set the time
 *
 * @access private
 * @param float $timeKeeper (reference)
 * @return float $timeKeeper
 */
function &setTime(&$timeKeeper) {			// STATIC FLOAT METHOD
	$start = microtime();
	$start = explode(' ', $start);
	$start = (float)$start[1] + (float)$start[0];
	$timeKeeper = $start;
	return $timeKeeper;
}

/**
 * Set $url property
 *
 * @access private
 * @param mixed $url
 */
function &setURL($url) {				// STATIC VOID METHOD
	if (is_object($this) && !$this->url) $this->url = $url;
}

//--------------------------------------------- --* END OF GETTER/SETTER METHODS *-- -------------------------------
/**
 * Configure URL
 *
 * @access private
 * @param mixed $url (reference)
 * @return mixed $url
 */
function &configureURL(&$url) {			// STATIC STRING METHOD
	if (preg_match('/\/[a-zA-Z0-9\-_]+$/i', $url)) $url .= '/';
	return $url;
}

/**
 * Display HTML based on given URL property value
 *
 * @access public
 * @param mixed $url (optional)
 * @return mixed HTML
 */
function &displayHTML($url = '') {			// STATIC HTML STRING METHOD
	global $projectFolderName, $username, $password;
	if (is_object($this) && !$this->getURL() && $url) $this->setURL($url);
	if (is_object($this)) $url = $this->getURL();
	if ($url && ini_get('allow_url_fopen')) {
	 $url = $this->configureURL($url);
	 $this->setRemoteSessionID();		// GET REMOTE SESSION ID IF FOUND - REQUIRES RE-SCRAPING REMOTE SITE ONCE SESSION EXISTS
	 print_r("<P>this->PHPSESSID = "); print_r($this->getRemoteSessionID()); print_r("<P>");
	 // SET UP CURL RESOURCE OBJECT BASED UPON RECONFIGURED URL
	 $ch = curl_init();
	 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	 curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
	 curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
	 curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
	 curl_setopt($ch, CURLOPT_HEADER, 1);
	 curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0);
	 $this->setCookieCurlSetOpt($ch);
	 curl_setopt($ch, CURLOPT_SSLVERSION, 3);
	 curl_setopt($ch, CURLOPT_URL, $url);
	 $this->setTime($this->startTime);	// SET START TIME
	 $html .= curl_exec($ch);
	 $timer = $this->setTime($timer);
	 // CLOSE CURL RESOURCE OBJECT
	 curl_close($ch);
	 $html = preg_replace('/(.*<body[^>]*>)/i', '$1<p><b>' . ($timer - $this->startTime) . " seconds to run URL: \"$url\"</b>", $html);
	}
	return $html;
}

}

Phil

    Good news and VERY bad news:

    1) Good news: I was able to obtain the remote cookie information $_COOKIE['PHPSESSID'] in order to have all of my values set for cookie to obtain the correct URL info using CURL

    2) VERY BAD News: The only way you can do it is that you would literally have to know the exact md3 encryption key mapped to your $SESSION that becomes the value of the remote $COOKIE['PHPSESSID']. In other words, you'd have to hack into the remote server to find the correct file /tmp/sess_[md5] and [md5] will be $PHPSESSID.

    There has got to be an easier way than this!

    Phil

      So, what you do is that you make sure to add $_COOKIE['PHPSESSID'], which happens to initially be the remote PHPSESSID for some wacko reason within your scrape.php, and then add the cookie string to the HTTP headers..

      class Timer extends View {
      
        function Timer() { 
         // DO STUFF
        }
      
      
      /**
       * Set the cookie in $_COOKIE into the curl reference
       *
       * @access private
       * @param resource $ch (reference) curl reference
       */
      function &setCookieCurlSetOpt(&$ch) {		// STATIC VOID METHOD
      	/*----------------------------------------------------------------------------------------------------------------------------
      		Remember that unlike $_GET or $_POST requests which use '&' to "glue" all key/val pairs together,
      		$_COOKIE requires a semicolon instead to use as its glue to adjoin all cookie key/val pairs together
      
      		Also note on first line to add the value of $_COOKIE['PHPSESSID'] to the cookie string if found as
      		$_COOKIE will capture the remote value of the session id onto itself, stripping ownership of it from
      		the remote script.  The cookie string will reassign it back to the remote script's HTTP headers which
      		will allow for the correct rendering of the remote script if $PHPSESSID is required for that page
      	-----------------------------------------------------------------------------------------------------------------------------*/
      	if ($_COOKIE['PHPSESSID']) $qs = 'PHPSESSID=' . $_COOKIE['PHPSESSID'];
      	@reset($this->cookies);
      	if (is_array($this->cookies) && @sizeof($this->cookies) > 0) {	// SET ALL ARRAY KEYS AND VALUES INTO COOKIE HTTP STRING FOR CURL
      	 foreach ($this->cookies as $cookieName => $cookieVal) {
      	   $qs .= "; $cookieName=";
      	   if (is_array($cookieVal) || is_object($cookieVal)) {
      	   $qs .= serialize($cookieVal);
      	  } else {
      	   $qs .= $cookieVal;
      	  }
      	 }
      	} elseif (@sizeof(array_values($_COOKIE)) > 0) {
      	 foreach ($_COOKIE as $key => $val) if (is_array($val) || is_object($val)) $qs .= "; $key=" . serialize($val); else $qs .= "; $key=$val";
      	}
      	if ($qs) curl_setopt($ch, CURLOPT_COOKIE, $qs);
      }
      
      
      /**
       * Display HTML based on given URL property value
       *
       * @access public
       * @param mixed $url (optional)
       * @return mixed HTML
       */
      function &displayHTML($url = '') {			// STATIC HTML STRING METHOD
      	global $projectFolderName, $username, $password;
      	if (is_object($this) && !$this->getURL() && $url) $this->setURL($url);
      	if (is_object($this)) $url = $this->getURL();
      	if ($url && ini_get('allow_url_fopen')) {
      	 $url = $this->configureURL($url);
      	 // SET UP CURL RESOURCE OBJECT BASED UPON RECONFIGURED URL
      	 $ch = curl_init();
      	 curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
      	 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
      	 curl_setopt($ch, CURLOPT_URL, $url);
      	 curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
      	 curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
      	 curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
      	 curl_setopt($ch, CURLOPT_HEADER, 0);
      	 curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0);
      	 $this->setCookieCurlSetOpt($ch);
      	 curl_setopt($ch, CURLOPT_SSLVERSION, 3);
      	 $this->setTime($this->startTime);	// SET START TIME
      	 $html .= curl_exec($ch);
      	 $timer = $this->setTime($timer);
      	 $html = $this->displayErrorHTML($html, $ch); // FILTER HTML TO ERROR MESSAGE IF INCORRECT
      	 // CLOSE CURL RESOURCE OBJECT
      	 @curl_close($ch);
      	 if (!$this->getHasNoHTML()) $html = preg_replace('/(.*<body[^>]*>)/i', '$1<p><b>' . ($timer - $this->startTime) . " seconds to run URL: \"$url\"</b>", $html);
      	}
      	return $html;
      }
      
      }
      

      In case anyone wanted to know what I did.

      Phil

        it may be easier and use less code if you just use a COOKIEJAR and COOKIEFILE to store cookies for each site, and delete the files after the test is done. that way curl gets and sets the cookies for you.

          COOKIEJAR and COOKIEFILE both fail upon this attempt, it can't find the remote cookie file nor be able to know what it might be.

          Phil

            Originally posted by ppowell
            COOKIEJAR and COOKIEFILE both fail upon this attempt, it can't find the remote cookie file nor be able to know what it might be.

            Phil

            Now, that is strange.

            What curl does is more or less to act like a browser for you. And thus it can store all cookies sent by the remote site. And it will also send them back Thus doing what you are doing for you.
            Remember that if you try to curl for pages on your local machine, thus using it as test it will not save cookies. Search the forum here for the reason why since I dont remember.

            Also I had real trouble with my curl cookiefile until I tried to not specify a path but just the filename :-)

              Did just that: saved just the file as the CURL docs in the PHP manual suggest, to no avail; it never found them.

              The only way for me to obtain the remote session was to just pull up "scrape.php" which magically HAD the remote session id as $_COOKIE['PHPSESSID'] and use that within the curl resource object.

              Phil

                Originally posted by ppowell
                COOKIEJAR and COOKIEFILE both fail upon this attempt, it can't find the remote cookie file nor be able to know what it might be.

                Phil

                the cookie file isnt a remote file, its the name you specify of a local file used to save cookie data to so it can be used in another session, either 1 minute, or 1 year from the original session.

                for example:

                <?php
                
                //create cookie file and save values
                $ch = curl_init("http://mail.yahoo.com");
                curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
                curl_setopt($ch, CURLOPT_HEADER, 1);
                curl_setopt($ch, CURLOPT_COOKIEJAR, "yahoocookie.txt");
                curl_setopt($ch, CURLOPT_COOKIEFILE, "yahoocookie.txt");
                $data = curl_exec($ch);
                
                curl_close($ch);
                
                //create new session and use cookies from last session
                $ch2 = curl_init("http://mail.yahoo.com");
                curl_setopt($ch2, CURLOPT_RETURNTRANSFER, 1);
                curl_setopt($ch2, CURLOPT_HEADER, 1);
                curl_setopt($ch2, CURLOPT_COOKIEJAR, "yahoocookie.txt");
                curl_setopt($ch2, CURLOPT_COOKIEFILE, "yahoocookie.txt");
                $data = curl_exec($ch2);
                
                curl_close($ch2);
                ?>
                

                the manual doesnt document CURLOPT_COOKIEJAR but basically it designates the file to store cookies in and CURLOPT_COOKIEFILE specifies where to read the cookies from.

                a resulting file looks like:

                # Netscape HTTP Cookie File
                # [url]http://www.netscape.com/newsref/std/cookie_spec.html[/url]
                # This file was generated by libcurl! Edit at your own risk.
                
                .online.***.com	TRUE	/	TRUE	0	FORTUNE_COOKIE	6mQZPBYgod3)RfCQKdcU(Q== 12/07/2004 10:18:31
                .***.com	TRUE	/	TRUE	0	BACOOKIE	[url]https://online.***/***[/url] 2004 6 12 17 18 31
                .***.com	TRUE	/	FALSE	0	COOKIE_SID	AyHmJAGBg1liO2pa0F2AhdzllWD6emby1Dv56dVXNU2VjP05qA34
                .***.com	TRUE	/	TRUE	1246511871	wfacookie	O07072004101751-551871485
                

                  Dude, I did EXACTLY what you just posted, down to the name of the cookie file!!

                  //create cookie file and save values
                  $ch = curl_init("http://www.myotherphpwebsite.com");
                  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
                  curl_setopt($ch, CURLOPT_HEADER, 1);
                  curl_setopt($ch, CURLOPT_COOKIEJAR, "yahoocookie.txt");
                  curl_setopt($ch, CURLOPT_COOKIEFILE, "yahoocookie.txt");
                  $data = curl_exec($ch);
                  
                  curl_close($ch);
                  

                  Happening to have SSH access to www.myotherphpwebsite.com I could verify that the value of "yahoocookie.txt" was completely wrong for PHPSESSID, where you are supposed to have this:

                  PHPSESSID=abcdef0123456789abcdef0123456789

                  You find this in /tmp/yahoocookie.txt:

                  PHPSESSID=0987654321ABCDEF0987654321abcdef

                  When you set the cookie the first time even using COOKIEJAR and COOKIEFILE it cannot find the specifically previously-set session variable in /tmp that you set when you physically went to www.myotherphpwebsite.com and since it can't find that specific session variable because you implicitly didn't tell www.myotherphpwebsite.com what to look for as the md5 key that is the NAME of the session variable filename, it just sets a whole new session variable (with no values for its keys) and that new md5 key becomes the value of PHPSESSID and then set into yahoocookie.txt

                  Thus, it failed to set the cookie correctly in the first place and then re-retrieves the wrong cookie values.

                  It never produced the contents of the remotely set cookie file, in fact, it generated a brand-new cookie for PHPSESSID every single time; however, every other value in $COOKIE was found and set that way, but $COOKIE['PHPSESSID'] had the wrong value because it could not seem to find that specific value that is found as /tmp/sess_[md5 value]

                  Phil

                    Write a Reply...