How often does Apache access each file under load?
Results 1 to 9 of 9

Thread: How often does Apache access each file under load?

  1. #1
    Senior Member
    Join Date
    Apr 2003
    Location
    Silver Lake
    Posts
    4,886

    How often does Apache access each file under load?

    I'm in the process of configuring a WordPress site for a multi-machine (i.e., clustered) deployment. I the interest of replicating files across all of the LAMP servers in the cluster, I'm considering using an NFS share to contain the web root and have all machines access it. Assuming that's a good idea (and please tell me if you think it's not), then it would stand to reason that scalability would probably be bottlenecked by the ability of the NFS share to handle requests from many machines. Imagine 100 machines slamming one NFS share.

    With that in mind and the fact that I wanted to know which files to replicate, I went hunting around and found a remarkable command that will tell you when file system changes of various types happen in a particular directory:
    Code:
    inotifywait -mr /var/www/html -e access -e create -e close_write -e modify -e move -e delete --timefmt "%Y%m%d%H%M%S" --format "%Tw:%wf:%f e:%e" 2>&-
    I am not certain, but I believe that particular command will tell me not just when files are moved in/into/out of this directory, but also when any file is accessed. This brings me to my main question.

    When Apache is serving a PHP website, how often does it access a particular file? The reason I ask is because I ran that command on my web root and accessed the site I'm working with. The *only* file that showed any access was .htaccess. I find this completely baffling. How would apache know if a PHP source file or js or jpeg or css file had changed without accessing it? Why on earth does apache access .htaccess -- and .htaccess only -- for a given request but never accesses the requested file itself? Here's the output from one page request:
    Code:
    $ inotifywait -mr /var/www/html -e access -e create -e close_write -e modify -e move -e delete --timefmt "%Y%m%d%H%M%S" --format "%Tw:%wf:%f e:%e" 2>&-
    
    20130118185821w:/var/www/html/f:.htaccess e:ACCESS
    20130118185823w:/var/www/html/f:.htaccess e:ACCESS
    20130118185823w:/var/www/html/f:.htaccess e:ACCESS
    20130118185823w:/var/www/html/f:.htaccess e:ACCESS
    20130118185823w:/var/www/html/f:.htaccess e:ACCESS
    20130118185823w:/var/www/html/f:.htaccess e:ACCESS
    20130118185823w:/var/www/html/f:.htaccess e:ACCESS
    20130118185823w:/var/www/html/f:.htaccess e:ACCESS
    20130118185823w:/var/www/html/f:.htaccess e:ACCESS
    20130118185823w:/var/www/html/f:.htaccess e:ACCESS
    20130118185823w:/var/www/html/f:.htaccess e:ACCESS
    20130118185823w:/var/www/html/f:.htaccess e:ACCESS
    20130118185823w:/var/www/html/f:.htaccess e:ACCESS
    Any information would be much appreciated.
    IMPORTANT: STOP using the mysql extension. Use mysqli or pdo instead.
    World War One happened 100 years ago. Visit Old Grey Horror for the agony and irony.

  2. #2
    Pedantic Curmudgeon Weedpacket's Avatar
    Join Date
    Aug 2002
    Location
    General Systems Vehicle "Thrilled To Be Here"
    Posts
    21,889
    Just as a random suggestion: Apache records the last-modified time when it does access (and cache) a file - and only feels it needs to read the file if the last-modified time changes. (In other words, inotifywait doesn't count merely querying a file's metadata (such as last-modified time, or permissions, or filename) as actually accessing the file.)
    Last edited by Weedpacket; 01-19-2013 at 10:29 PM.
    THERE IS AS YET INSUFFICIENT DATA FOR A MEANINGFUL ANSWER
    FAQs! FAQs! FAQs! Most forums have them!
    Search - Debugging 101 - Collected Solutions - General Guidelines - Getting help at all

  3. #3
    Senior Member
    Join Date
    Apr 2003
    Location
    Silver Lake
    Posts
    4,886
    Thanks for your input, Weedpacket. This is really bugging me.

    Quote Originally Posted by Weedpacket View Post
    Just as a random suggestion: Apache records the last-modified time when it does access (and cache) a file - and only feels it needs to read the file if the last-modified time changes. (In other words, inotifywait doesn't count merely querying a file's metadata (such as last-modified time, or permissions, or filename) as actually accessing the file.)
    I have run the inotifywait command and then restarted apache and this restart does not result in Apache accessing any files in the web root. I am puzzled that checking a file's modification date does not constitute "access." I'm more puzzled that an apache restart didn't result in the web root being accessed.

    I tried rebooting my machine entirely, running the inotifywait command, and accessing the site. This also did not result in access to anything but the .htaccess file. Apache is configured to start at boot time, so I was thinking perhaps Apache cached these files at boot?

    So I started looking for an Apache cache. Googled around, did some grep searches and located the htcacheclean command. The grep searches revealed this:
    Code:
    CacheRoot /var/cache/apache2/mod_disk_cache
    And I ran this command:
    Code:
    $ sudo htcacheclean -l 4096M -rvp /var/cache/apache2/mod_disk_cache
    Statistics:
    size limit 4096.0M
    total size was 0.0K, total size now 0.0K
    total entries was 0, total entries now 0
    Which appears to have reported that it accomplished nothing. I've tried it again while running the inotifywait command to watch this cache directory and it really does seem to accomplish precisely nothing. And yet Apache does seem to cache these files -- I just don't know where.

    I did a "touch index.php" and accessed my site -- this finally resulted in file access to my web root, but there was absolutely no file system activity reported in the cache directory. I'm thinking the cache might be somewhere else. Is there some apache command to find out?

    I have not been able to determine precisely when and how often Apache accesses the files in my web root or where Apache is caching things. I'm also wondering what kind of file system load results from all these checks on a file modification time. I expect it's much easier than reading the contents of the file.

    And I'm also totally confused that I see several file accesses on .htaccess for each page load. Why apache cannot cache this file is a mystery to me.
    IMPORTANT: STOP using the mysql extension. Use mysqli or pdo instead.
    World War One happened 100 years ago. Visit Old Grey Horror for the agony and irony.

  4. #4
    NMaOtBG bpat1434's Avatar
    Join Date
    Oct 2004
    Location
    Around 255.255.255.0
    Posts
    7,850
    Apache always does a recursive check on the directory tree of the current document. Because the htaccess files are per directory and can impact how the request is handled and they are recursive they can't really be cached. This is the down fall of apache. Other systems like nginx bypass/optimize this feature and gain the performance boost.

    Apache itself won't execute the php file in a cgi setup. I'm also not sure a read counts as an access like a process to write it would. I.e. is cat-ing a file the same as vim-ing it? In one instance the file cannot change as the mode is read only.

    I'm not sure off your setup but would have you looked into amazon and using s3 along with many micro or small ec2 instances to serve up your content? You can use dds to store the db, run many ec2 instances to house the wordpress code and then use s3 to store User files like themes, uploads and plugins. If you use beanstalks you can disperse an upgrade across your entire system in a snap. Plus you can grow and shrink as needed and even expand across Geographic areas giving you failover capability (this is what netflix is missing).

  5. #5
    Senior Member
    Join Date
    Apr 2003
    Location
    Silver Lake
    Posts
    4,886
    I believe moving the mod_write rules from .htaccess to the apache conf file (loaded at apache startup) will improve performance and eliminate the need to check all the .htaccess files. Unfortunately, Wordpress has features that make it write this file -- e.g., when you change your "permalink" style.

    Using this script:
    PHP Code:
    <?php 
    print php_sapi_name();
    ?>
    I get "apache2handler" which suggests this particular machine is *not* running in CGI mode. I would love to know where the caching happens so I can check access patterns over there.

    Doing a cat operation on a particular file via command line does not appear to trigger any access in this directory either. I'm beginning to wonder WTF "access" really means. I am hoping to find some kind of file access monitoring that actually gives me some idea of the workout my disk is getting.

    I am indeed going to use EC2 and RDS and an ELB. I looked into using the W3 Total Cache plugin for Wordpress but from what I can tell it doesn't really cover everything -- namely certain core functionality of Wordpress that deletes or modifies local PHP files. Unless I'm missing something, my EC2 instances shouldn't be trying to require_once PHP files from S3.
    IMPORTANT: STOP using the mysql extension. Use mysqli or pdo instead.
    World War One happened 100 years ago. Visit Old Grey Horror for the agony and irony.

  6. #6
    Senior Member
    Join Date
    Apr 2003
    Location
    Silver Lake
    Posts
    4,886
    CORRECTION: cat cat operation does result in an access event notification. Sorry. It's late and I am tired.

    Still wondering where apache is caching everything. Any thoughts on how to find out?
    IMPORTANT: STOP using the mysql extension. Use mysqli or pdo instead.
    World War One happened 100 years ago. Visit Old Grey Horror for the agony and irony.

  7. #7
    NMaOtBG bpat1434's Avatar
    Join Date
    Oct 2004
    Location
    Around 255.255.255.0
    Posts
    7,850
    My point with AWS and Beanstalk was not to put the WP core files in S3. Rather you keep the core files on each individual server and use S3 to house your user-uploaded content (images, videos, documents, etc.). When you install a plugin, all you have to do is then tell beanstalk you have a new version and to deploy it and it will deploy to all your servers for you.

    Seems a lot easier than copying the file mulitple times to different places.

    [See Elastic Beanstalk - Deploying Versions to Existing Environments]
    Last edited by bpat1434; 01-21-2013 at 07:09 AM.

  8. #8
    Pedantic Curmudgeon Weedpacket's Avatar
    Join Date
    Aug 2002
    Location
    General Systems Vehicle "Thrilled To Be Here"
    Posts
    21,889
    One thing to keep in mind is that filesystem use isn't necessarily 1:1 with physical disk usage: because reading from core is so much faster than reading from disk, the operating system (depending on what it is) may use otherwise-idle RAM to image chunks of the disk that it anticipates frequently reading from (this is why Linux, for example, reports such a low value for unused memory).
    THERE IS AS YET INSUFFICIENT DATA FOR A MEANINGFUL ANSWER
    FAQs! FAQs! FAQs! Most forums have them!
    Search - Debugging 101 - Collected Solutions - General Guidelines - Getting help at all

  9. #9
    Senior Member
    Join Date
    Apr 2003
    Location
    Silver Lake
    Posts
    4,886
    Quote Originally Posted by Weedpacket View Post
    One thing to keep in mind is that filesystem use isn't necessarily 1:1 with physical disk usage: because reading from core is so much faster than reading from disk, the operating system (depending on what it is) may use otherwise-idle RAM to image chunks of the disk that it anticipates frequently reading from (this is why Linux, for example, reports such a low value for unused memory).
    This is really valuable input. I think this may be what is going on.

    From the apache docs:
    Operating System Caching

    Almost all modern operating systems cache file-data in memory managed directly by the kernel. This is a powerful feature, and for the most part operating systems get it right. For example, on Linux, let's look at the difference in the time it takes to read a file for the first time and the second time;

    Code:
    colm@coroebus:~$ time cat testfile > /dev/null
    real    0m0.065s
    user    0m0.000s
    sys     0m0.001s
    colm@coroebus:~$ time cat testfile > /dev/null
    real    0m0.003s
    user    0m0.003s
    sys     0m0.000s
    Even for this small file, there is a huge difference in the amount of time it takes to read the file. This is because the kernel has cached the file contents in memory.

    By ensuring there is "spare" memory on your system, you can ensure that more and more file-contents will be stored in this cache. This can be a very efficient means of in-memory caching, and involves no extra configuration of Apache at all.

    Additionally, because the operating system knows when files are deleted or modified, it can automatically remove file contents from the cache when necessary. This is a big advantage over Apache's in-memory caching which has no way of knowing when a file has changed.
    IMPORTANT: STOP using the mysql extension. Use mysqli or pdo instead.
    World War One happened 100 years ago. Visit Old Grey Horror for the agony and irony.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •