Some Regex's

Good2CU

I'd like to parse some information from search engine referring urls:

subdomain, domain (including tld), keyword searched, original query and page number. This is intended mainly for Google, but will use others as well.

page number is generally "start=#" and will be 0 or absent for page 1, start=1 will be page 2 etc.

Original Query (for Google Suggest) is oq=

Searched Keyword is q= on Google, but k= on some others.

I'd like to track images. and www. separately for sub-domains.

I intend to capture everything until a & occurs or end of line.

This is what I'm thinking, but I'm having a hard time getting queries to work as intended.

//subdomain (this should be optional just in case they don't use it)
#https?://(..[\s]{2,4}/)#i //domain - I really have no idea how to do this with making sub-domain optional
#(?|&)(q=|k=)(.)[&\b]#i //searched keyword
#(&oq=)(.*)[&\b]#i //original query
#&start=(\d)#i //page

Thanks!

Good2CU

http://www.google.com/search?hl=en&client=safari&rls=en&q=example keyword&start=10&sa=N&oq=exam

Forgot to include an example

laserlight

Perhaps you can use [man]parse_url/man.

halojoy

To get variables in URL
this is very nice

<?php

parse_str($_SERVER['QUERY_STRING'], $arr);
echo $arr['start'];
echo '<br />';
echo $arr['oq'];


// Or used without the array parameter
parse_str($_SERVER['QUERY_STRING']);
echo $start;
echo '<br />';
echo $oq;

?>

[man]parse_str[/man]

laserlight

Oh yeah. You should use both together. If you are trying to extract this URL from some larger portion of text, on the other hand, then these would not be (immediately) appropriate (and that's where regex can help).