Here's my concern about working on a project of Apache Log Analysis with PHP & MySQL.

I have about 500 different pages, each with unique id (eg. 29323.htm).

I would like to know the number of total hits each page (with referrer from particular locations) receives by analyzing the access_log (since I'm tracking millions of hits per day, this is the reason I choose to work with the log files instead of tracking the hits when the page is being shown).

I'll store the final result into a table (with two fields: page_id and hits) for retrival by another script.

I need this to be done in a real-time manner, and I choose to make simulation by a 5 minutes interval update of doing log analysis.

But the problem is, I can't make log rotation every 5 minutes because another web traffic reporting software that I'm using requires the log to be rotated no less than 30 minutes for unique visitors tracking.

  1. For my project, how am I going to know if the log has been rotated or not?

  2. If I make a 5 minutes interval execution of my script, it will end up analyzing lines that has been processed in the last execution. Is there any solution that I can make the script avoid working with data that has been processed before?

The Apache log format is of the following (which I can't change because of the specification of my web traffic reporting software):

LogFormat "%h %v %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\""

A row would look like the following:

1.2.3.4 domain.com - [datetime] "GET /29323.htm HTTP/1.0" 200 7575 "referrer url" "browser type"

What I'm doing right now is loading the log file into a temporary table by the following:

LOAD DATA LOCAL INFILE 'access_log' INTO TABLE logs FIELDS TERMINATED BY '\"';

The table structure of logs looks like:
field ip varchar(15)
field pageid varchar(20)
field status varchar(4)
field referer varchar(40)

With data looks like:
1.2.3.4
GET /29323.htm
200
http://domain.com/referrer.html

Can someone tell me what should I do next to accomplish my needs?

I'm really stuck here and can't go any further.

Thank you for your patience in finished reading the above.

    • [deleted]

    Apache can log to as many files as you like, in as many formats as you like, all at the same time.

    You could setup an extra log in your own format, and only rotate that every 5 minutes, and rotate the other log every 30 minutes.

      Hi vincent,

      It's really an awesome technique. I'm now checking the manual of mod_log_config.

      This really helps me to simplify my problem a lot.

        Regarding to log rotation, can I just create a cron job to process the log file via my script, delete the log file and then restart Apache?

        Thank you.

          • [deleted]

          Not in that order.

          You should not read from the logfile while apache is still writing to it.
          That means you'd have to stop apache, process the file, and start apache again.
          That would mean an unacceptable interruption.

          So the second option is to stop apache, move the logfile to a new location, and immediately start apacche again.
          This is faster, but still gives a few seconds downtime. Unacceptable.

          Option three solves it.
          Un*x filesystems work using inodes, which means taht you don't write to a file, you write to an inode that points to a file.
          Whenever you move/rename a file, the destination of the inode changes, but the inode itself stays the same.
          That means you can move the logfiles around, even rename them, and apache will continue to use that particular file.

          That means you can move the logfile while apache is still running.
          When the file is moved, you can give apache a stop-and-go penalty.
          That will force apache to close it's logfiles and re-open them.
          During that re-opening apache will check it's config files to see where it should open the logfiles, and thus it will open them in the original location, creating a new file.

          Et presto, you have the logfile and the downtime was limited to a few milliseconds.

          PS this downtime issue is the reason why you should never one apache server to run many virtual hosts.
          Instead, always make a new apache installation for every domain you wish to host. That way you can stop/start each domain seperately. If one domain takes long to restart, the others won't notice.

            I was testing it by renaming the log file while Apache is running, and it works, the new entries are still being added to the file after I rename it.

            Do I just issue a /usr/sbin/httpd restart after the log file has been renamed to create a new log file?

            Et presto, you have the logfile and the downtime was limited to a few milliseconds.

            However, when I issue this command, it tries to shut down apache and then start it again. This process sometimes takes 1-2 seconds or so. How can I achieve what you mentioned in a few milliseconds?

            Thank you.

              • [deleted]

              Did you try sending the HTTPD process a HUP signal?

                What's a HUP signal?

                I just execute the line "/usr/sbin/httpd restart" to make a Apache restart.

                Is there any other thing that I forgot to do?

                  • [deleted]

                  a restart will stop and the start the deamon, where a HUP will cause the deamon to re-read it's config files, and consequently close and re-open it's logfiles.

                  Do 'man ps' and 'man kill' to get more info.

                    Thank you, I'll do a little research on that subject.

                      ttrr

                      Jay wrote:

                      Regarding to log rotation, can I just create a cron job to process the log file via my script, delete the log file and then restart Apache?

                      Thank you.

                        Write a Reply...