I'd like to parse some information from search engine referring urls:
subdomain, domain (including tld), keyword searched, original query and page number. This is intended mainly for Google, but will use others as well.
page number is generally "start=#" and will be 0 or absent for page 1, start=1 will be page 2 etc.
Original Query (for Google Suggest) is oq=
Searched Keyword is q= on Google, but k= on some others.
I'd like to track images. and www. separately for sub-domains.
I intend to capture everything until a & occurs or end of line.
This is what I'm thinking, but I'm having a hard time getting queries to work as intended.
//subdomain (this should be optional just in case they don't use it)
#https?://(..[\s]{2,4}/)#i //domain - I really have no idea how to do this with making sub-domain optional
#(?|&)(q=|k=)(.)[&\b]#i //searched keyword
#(&oq=)(.*)[&\b]#i //original query
#&start=(\d)#i //page
Thanks!