Here's my concern about working on a project of Apache Log Analysis with PHP & MySQL.
I have about 500 different pages, each with unique id (eg. 29323.htm).
I would like to know the number of total hits each page (with referrer from particular locations) receives by analyzing the access_log (since I'm tracking millions of hits per day, this is the reason I choose to work with the log files instead of tracking the hits when the page is being shown).
I'll store the final result into a table (with two fields: page_id and hits) for retrival by another script.
I need this to be done in a real-time manner, and I choose to make simulation by a 5 minutes interval update of doing log analysis.
But the problem is, I can't make log rotation every 5 minutes because another web traffic reporting software that I'm using requires the log to be rotated no less than 30 minutes for unique visitors tracking.
For my project, how am I going to know if the log has been rotated or not?
If I make a 5 minutes interval execution of my script, it will end up analyzing lines that has been processed in the last execution. Is there any solution that I can make the script avoid working with data that has been processed before?
The Apache log format is of the following (which I can't change because of the specification of my web traffic reporting software):
LogFormat "%h %v %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\""
A row would look like the following:
1.2.3.4 domain.com - [datetime] "GET /29323.htm HTTP/1.0" 200 7575 "referrer url" "browser type"
What I'm doing right now is loading the log file into a temporary table by the following:
LOAD DATA LOCAL INFILE 'access_log' INTO TABLE logs FIELDS TERMINATED BY '\"';
The table structure of logs looks like:
field ip varchar(15)
field pageid varchar(20)
field status varchar(4)
field referer varchar(40)
With data looks like:
1.2.3.4
GET /29323.htm
200
http://domain.com/referrer.html
Can someone tell me what should I do next to accomplish my needs?
I'm really stuck here and can't go any further.
Thank you for your patience in finished reading the above.