For quite a while now, I've been developing a site to give a client access to PDF files that contain banking information. In the past I've avoided putting the files online, and instead have supplied them CD/DVDs for them to access, but they are pushing me to develop a section of the website to allow them to find and view/print/download these PDF files.

Is there a way to secure these downloads? My first thought was to create a directory above the web root directory and put all the PDFs in there, then store the file names in a database, so when they search for a file, it will locate the filename and then I could use an anchor tag to allow them to download the file.

This method does not seem very secure to me, even if I have an SSL and ensure that the user is logged in, because the path could be picked up easily and the filenames are sequential numbers like 52727-VF.pdf, 52728-MF.pdf... the initials are employee's initials, which probably wouldn't be hard to figure out, especially if it's a former disgruntled employee trying to hack the site.

Any suggestions of a way to secure these downloads would be greatly appreciated.

Thanks
Brian

    I might to something like:

    1. Put the files outside of the web root directory hierarchy (but readable by the web server) so that they cannot be accessed directly by HTTP requests.

    2. Set up a password-controlled login system for users.

    3. When a given file is eligible for download by a given user, add an entry to a table in the DB which would have columns for the user ID and the file ID, the latter pointing to a "file" table with info that includes its file name (and directory info if it varies) along with any other data of interest you want to include.

    Then when a user logs in, they can be shown a list of files to which they have access via a DB query, with links to a file-server PHP page, the links including the file identifier. When the user clicks a link, the file-server script checks his login status, the file ID is checked against the DB to see if he is allowed access to it, and if so, it is output to the user (with applicable HTTP headers and such to make it a download).

      If you prefer just a slight modification to your system, you could use SSL/TLS with basic authentication for HTTP, then serve the PDF document in a compressed archive that has a randomly generated long name.

        Thank you for the quick responses.

        Laserlight - I want to allow them to view it in the browser if possible rather than downloading, so I'd prefer not to compress the files, but that does give me an idea. I may write a uploader that would randomize the filename during upload and then using NogDog's suggestion, I can create a file table to keep track of them, then when delivering the file, can I set the name of the file, so that it downloads using the numbering system?

        From what I've seen it seems that PDF display in the browser is dependent on how the browser is configured. But I've display PDFs in Google Mail and it renders them in the browser regardless of the browser configuration. Does anyone know if Google allows developers access to this functionality, and if so, do you know what it's called? I've tried looking for an API to use, but haven't found anything yet.

        Thanks again for the help.
        Brian

          woodeye wrote:

          I want to allow them to view it in the browser if possible rather than downloading, so I'd prefer not to compress the files

          In that case you could do without that part. However, basic authentication is rather coarse grained, hence it becomes an "all or nothing" permission where a user who has access to one document has access to all of them, if he/she can figure out their file names.

          EDIT:

          woodeye wrote:

          I may write a uploader that would randomize the filename during upload and then using NogDog's suggestion

          NogDog's suggestion allows fine grained access control without obfuscating the file names, but of course, it also requires more development work. There would be no point in obfuscating the file name since the file is not kept in the public web area.

            But wouldn't the path to the file still be accessible? So if my path was:
            /var/chroot/home/content/83/7788683/pdf_docs/

            Where the HTML folder would be:
            /var/chroot/home/content/83/7788683/httpdocs/

            Isn't there someway a hacker could use that path if they guessed the filename correctly? Or is it only accessible by the server?

            Thanks again for all the help, I really appreciate it!

              woodeye wrote:

              Isn't there someway a hacker could use that path if they guessed the filename correctly?

              No, because the web server does not serve files in that directory since it was configured not to do so.

              In my suggestion, the web server serves the file directly, but there's basic authentication to get past first.

                Write a Reply...