Hey everyone, i've been playing around with a load of different ways of protecting my images from hotlinking and also from people accessing them directly by typing their address directly into the address bar (they are only viewed from within my php pages) and have come full circle and am now thinking of using the following htaccess file within my images directory:


RewriteEngine on
RewriteCond %{HTTP_REFERER} !http://(www.)?mysite.com/.*$ [NC]
RewriteRule .(jpe?g|gif|bmp|png)$ http://www.mysite.com/images/inaccessibleimage.gif [R,L]

This seems to work fine, i have not included the standard line to allow anonymous referrers for browsers that don't set it/block it etc (eg RewriteCond %{HTTP_REFERER} !$) as that way it prevents people from typing it into the address field and accessing the images directly. Hopefully what this results in is when people view php pages on my site that access the images they will be able to view them, otherwise they will get the problem image. What i'm not sure of is, am i right in thinking that because images are only being accessed via my php pages on my site, the referer variable will always be passed server side and so even anonymous referers will be able to view the images as long as they are viewing them from within my php pages?

I hope that makes sense, i'm not all that knowledgeable in this area, not bad with php, but only just started playing around with htaccess as i've always done it all through php.I would really really appreciate any help you guys could give, as this has been plaguing me for ages now!

Thanks very much in advance everyone, all help is appreciated,

Dave

    am i right in thinking that because images are only being accessed via my php pages on my site, the referer variable will always be passed server side

    No, you are not right in thinking that. First of all, the only thing you can do is make it difficult for people wanting to see your images outside of the HTML pages - it's impossible to make it impossible to see them. The referrer variable is ALWAYS passed in from the client side to the server side which is why, with a little effort, anyone can view the graphics outside of the HTML page. The referrer variable is never stored server side which would have been your only chance to keep it hidden from the browser.

      thanks very much for your speedy response etully, that really is a shame. i know it's only possible to make it hard to view them outside my html pages rather than impossible, that's all i want to do really. I just can't work out how to prevent people from accessing the images directly (ie by typing something like www.mysite.com/image.jpg), if indeed there is a way to do that. Or is there literally no way of preventing people who know where they're located from accessing them directly if you want to use them within web pages (ie without password protecting them), the only alternative being to hide where the images are stored by accessing the freely available ones via php scripts in the img tag or just storing them in a completely different location?

      Did that make sense or did that just sound like crazy talk? 🙂

      I really appreciate your help,

      Dave

        No, you had it right the first time. The .htaccess trick will stop the average person from going directly to the image like this: http://www.mysite.com/image.jpg

        But if someone is really determined, they can spoof the referrer and then they can type:
        http://www.mysite.com/image.jpg and go directly to the image.

        I once need to scrape 20,000 images from a web site and I didn't want to visit 20,000 HTML pages (would have taken months) so I wrote a script that spoofed the referrer and downloaded all 20,000 images in a few hours. It's not a shame - it's just a matter of convenience.

        Since the referrer variable ALWAYS comes from the client side, it can ALWAYS be spoofed so you can NEVER trust it for any real security. But you can trust it to stop the casual surfer.

          you actually can prevent people from viewing your images except through your site but the process is pretty elaborate and involves sessions.

          basic idea:
          1) put image OUTSIDE of the public html directory so they cannot be viewed directly by a browser
          2) instead of linking to the images, have <img> tags reference a file called image_feeder.php
          3) any php page that references image_feeder.php must first put a unique secret code into session.
          4) when this page references image_feeder.php, append this secret code to the query string thusly:
          myfile.php

          <?
          $code = uniqid("");
          $_SESSION['secret_code'] = $code;
          ?>
          <img src="image_feeder.php?pic=foo.jpg&code=<?=$code ?>">
          

          then image_feeder.php will check the secret code passed in via $_GET against what is in session and if they match, it will cough up the image with the appropriate header/mime type:

          image_feeder.php

          if (empty($_GET['code'])) {
            // you might want to show some 'please visit mysite.com' image here
            die('code not provided via get');
          }
          if (empty($_SESSION['secret_code'])) {
            // you might want to show some 'please visit mysite.com' image here
            die('session code not set');
          }
          if ($_SESSION['secret_code'] != $_GET['code']) {
            // you might want to show some 'please visit mysite.com' image here
            die('get and session codes provided but they do not match');
          }
          
          // if you reach this point, just output the right mimetype and cough up the image:
          #output the chart
          header("Content-type: image/jpg");
          readfile("/home/secret/file/location/" . $_GET['pic']);
          

          Obviously this is a really drastically simplified version of the concept but i think it would work because only your site can set the session information and the user will have no control over what value goes into the secret code. remote machines will not be able to set the secret code.

          thoughts? it's pretty elaborate but i think it would work in theory.

            So once I visit the PHP page and the session cookie is set, and I see a cool graphic on the page and I want to view ALL your graphics, I could just right click on the image and select "Copy Image Location". Now I have the URL which will be in the form:

            http://www.your-site.com/images/image_feeder.php?image=12345
            or
            http://www.your-site.com/images/image_feeder.php?image=contact_us.gif

            So now, since the cookie is set in my browser (which I could just as easily copy into an automated script), I can visit every single one of the graphics in your images directory by name or id. I didn't say it would be easy - I just said that it couldn't be maide impossible. Some things can be made impossible - other things (like protecting images from being viewed outside an HTML page), you can't stop someone who is determined. Consider my 20,000 images problem. I had the option of spending 30 minutes to write a script or spending months to visit 20,000 HTML pages. I had a huge motivation to spend the time writing the script.

              thanks guys

              that's the problem i was thinking with sneakyimps solution, it works if you're half determined and not already on the site, but for my usage i'm trying to prevent what etully said, which is if someone's on the site they can just right click and find the image location and then go to the the others by entering the address and changing the the id.

              i don't mind about people who are prepared to spoof the referer as (as you say) there is no absolute way to protect this, however i'm not sure i totally understand. i think i may have misunderstood the referer variable, will it always exist unless someone is actively trying to spoof/change/hide it, in which case it doesn't matter about allowing empty referer variables as that would then only really be to allow direct access? i thought there were circumstances (maybe to do with firewalls/browsers etc) that meant a normal user might end up with an empty referer variable and hence not allowing access may prevent a normal user from viewing your images? if this is not the case then that should do pretty much exactly what i want i think, as you say it will prevent all normal users from entering in the image url unless they spoof the referer. i hope i haven't misunderstood what you said and again thanks very much for your help, both of you, it's seriously seriously appreciated!

                the solution I am offering is not to prevent scraping. once the image is on your machine, there's nothing i can do about it.

                i think i see what you mean but if you view mypage.php this is what happens:

                1) i put a new secret code in session. if you refresh the page, this code gets changed. it is never the same. it also doesn't get stored in a cookie. this is not a session id.

                2) the links on mypage.php reference not just the image but also the secret code set in the previous step:
                http://mysite.com/image_feeder.php?image=foo.jpg&code=ad345ef232

                3) each time that image_feeder.php gets loaded, it checks that secret code against what is stored in session.

                i think i see what you mean. if i point one browser at a page that has an image, the secret code gets set once and I can then just open a new browser window with some javascript and proceed to download all the other images because the secret code stored in session will not change because i'm not refreshing mypage.php.

                i still think it's possible to prevent people from referencing images that are hosted by your site. if i altered the approach above so that
                a) the filename was encrypted using the secret code stored in session
                or
                b) the secret code was different for every image - some special secret hash of the filename and the secret session code

                then things would be much harder.

                i do think there's some way to prevent people linking one's images.

                  as i understand it, the referer variable is set by the client and is sometimes not set at all. you have no control over that. that's what etully was talking about when he was spoofing a referer value.

                  the general idea behind the approach i suggested was that you keep setting a secret code on your server and forcing each request for an image to provide that secret code. as long as the secret code is continually reset in some unpredictable way you can prevent users from getting an image.

                    sneakyimp:
                    my concern with that is specifically how i want to use it myself. i have loads of individual galleries and each one has anywhere between 1 and a few hundred pics and people can view (in any way) the first couple of photos and not the rest and i want (amongst other things) to stop people loading the page, looking at the first couple of images, right clicking on one of them and immediately looking at following images by changing the id numbers (as the images are generally numbered pic1.jpg, pic2.jpg etc etc). Not saying your way wouldn't work, just for me would prefer to find a more simple version that would do exactly what i need.

                    etully:
                    please could you clarify what you meant earlier for me as i didn't quite understand it, you said:

                    No, you had it right the first time. The .htaccess trick will stop the average person from going directly to the image like this: http://www.mysite.com/image.jpg

                    But if someone is really determined, they can spoof the referrer and then they can type:
                    http://www.mysite.com/image.jpg and go directly to the image

                    That involves not allowing empty referers right? so then the only way of directly accessing the images is spoofing the referer to be mysite.com (which is fine for what i want, it doesn't need to be super secure)? is it not as sneakyimp said in that sometimes its (genuinely rather than maliciously) not set at all, in which case normal users wouldn't be able to access the images even through your webpage if for some nonmalicicious reason their referer variable wasn't set? or have i misunderstood?

                    Thanks very much guys.

                      There is no way to stop people from scraping all the images off your site. You can make it difficult but it's impossible to make it impossible.

                      Here's why: If the HTML web page can request the graphics so that it can display the graphics as part of the web page, then I can write a script that pretends to be a browser and your server will think that I'm just requesting the graphics for the purpose of displaying them when I'm really saving them to a directory.

                      Sneakyimp: You are 100% correct about your statement that you should be able to stop people from linking to your graphics on their web site. This is easy to do - it's simple - and it's 100% effective. But that's not the problem that deshg was trying to prevent. He is concerned about someone seeing a graphic, finding the URL, pasting the URL into the browser, and viewing the graphic outside of the HTML page that normally includes that image. That problem can be stopped against casual web surfers but not against someone who is willing to invest an hour of their time to write a script that pretends to be a web browser. That's the only point I was trying to make earlier.

                      Last point: Encrypting filenames will not help at all. Think of it this way: Imagine if you and I are in the livingroom and you yell into the kitchen, "Bring out the beer, the chicken, and the steak". Now I know the names of the items that are available in the kitchen and I can request them as much as I want. If you set up a secret signal with the kitchen and encrypt them and translate them to another language, you might yell, "Send me the cbhfvbfhvbf, the 238462387462834, and the !@#$%&()(&%$##." Now that I've heard you say the special filenames, I can request them too. So I say them in code and the kitchen knows exactly what to send out.

                      To stop the analogy... let's look at the technology. I go to your site and I see 40 images on your page. I want them and I want to pull each one up in my browser. I do a view source, copy the HTML, find all the IMG SRC tags, and find all the encrypted filenames getting passed to image_feeder. So I spoof the referrer variable and call each URL that your IMG SRC tag was calling. Your web server is completely unable to determine whether I'm Firefox trying to display the page or a clever hacker trying to view the graphics.

                      Regarding the idea of NOT showing the images if the referrer is blank: There are many times when the referrer is blank. People can turn it off, they can use browsers that don't send a referrer, they can be going through a firewall that doesn't support referrers, or they could be going through a proxy that doesn't support referrers. If you decide not to hand out the graphics to people who don't provide a referrer, then you will have real users who can't see the graphics on your web site.

                        By the way - making something difficult is not real security in the computer world. It's only difficult the first time. So many things can be automated (these are computers, after all), that difficult, frustrating, or time consuming tasks can be automated and then the automation routines can be shared with friends.

                        There is either (A) possible or (😎 impossible. There is no "difficult" - that doesn't stop anyone from doing anything important.

                          Yes I admit I cannot defend against scraping. You could spoof a browser, store the cookies, etc and view page after page on my site and finally download all my images. Nothing I can do about that.

                          HOWEVER, here's what he asked originally:

                          Hey everyone, i've been playing around with a load of different ways of protecting my images from hotlinking and also from people accessing them directly by typing their address directly into the address bar (they are only viewed from within my php pages)

                          I think my approach can effectively protect against that sort of thing.

                          To pick up on your kitchen analogy, suppose the names for beer, chicken, and steak are not just encrypted, but encrypted differently depending on who is asking for them? In that case when Colonel Mustard hears Ms. Scarlet ask for "!@#$%&()(&%$##", thereby attaining a steak, if he then tries to order "!@#$%&()(&%$##" he might well end up with a lead pipe to the head instead.

                          As proof of concept, I offer this page. I'm not certain it's foolproof, but I challenge you to
                          a) hot link any image from it from any other page or email message on the entire internet
                          b) send me a link that i can open in my browser to view the image
                          or
                          c) find even 1 name on any of the last images: 006.gif, 007.gif, 008.gif, 009.gif, and 010.gif.

                          http://oddballsinvitations.net/protect/myfile.php

                          There are 10 images total whose names start with 001.gif. The image filenames are not even encrypted in the link. Instead, there is a secret hash function based on a variety of secret ingredients which should make hotlinking or direct link exceedingly difficult.

                          Let me know if I am misunderstanding something.

                            Thanks again guys, but can i clarify something with both of you:

                            etully:
                            I TOTALLY appreciate i cannot make it impossible to harvest my images and i also appreciate what you mean about there being no such thing as difficult. All i am trying to do is exactly what you said earlier:

                            He is concerned about someone seeing a graphic, finding the URL, pasting the URL into the browser, and viewing the graphic outside of the HTML page that normally includes that image. That problem can be stopped against casual web surfers but not against someone who is willing to invest an hour of their time

                            My question is (ignoring the security implications of people who really want to access them and are prepared to spend time) is how do i do this and protect them from casual surfers direct linking? You said:

                            No, you had it right the first time. The .htaccess trick will stop the average person from going directly to the image like this: http://www.mysite.com/image.jpg

                            but then you later said:

                            If you decide not to hand out the graphics to people who don't provide a referrer, then you will have real users who can't see the graphics on your web site.

                            I think i am missing something here, but how do i prevent casual users from doing exactly what you said (typing in the direct address and accessing the image) if hiding the referer is a bad idea? Think i might be being crazy!

                            sneakyimp:
                            I can't easily access any of your other images numbered 5 - 10, interestingly though in IE when i direct link to one of the first five images (by checking their direct address within properties) it displays the error image, however when i do the same in firefox i am able to view the correct images directly. For me this doesn't matter as i don't mind people directly accessing the images that they can see through my site (they would only be able to directly access them whilst they were still on that page anyway presumably), i just don't want them accessing (in this example) images 6 - 10 by changing the numbers within the filenames, which this does fine. I have been trying to work out exactly how you have done this based on your earlier post about using img codes, i'm thinking something like:

                            1) Each time a page on my site is loaded a unique code is generated which is stored as a session variable
                            2) Each image tag produces a unique image code by encrypting the image name and the current unique code (in the session variable) together, which is then passed within the img tag to image_feeder.php
                            3) image_feeder.php checks the unique image code by encrypting the current session code and image name and displays the correct image if the codes match otherwise displays an error image

                            This obviously isn't quite right as whilst what i have said would prevent hotlinking and opening say image06 without backwards engineering the hash function by using the session variable and the visible img tag codes, it does not prevent direct access to the visible images by just copying the direct link into a new window in either IE or Firefox. Which isn't necessarily a problem, but your way seems better as it at least does this in IE. Would you mind elaborating on how the above method works, as it all seems quite nice?

                            Thanks very much again guys, you are both being extremely helpful in not only getting to a solution, but helping me understand the intricacies of this much more, which is important really.

                              sneakyimp:

                              I think you and I are in agreement and that we're just misunderstanding eachother.

                              I tried your page and it's clever. It will stop ANY hotlinking. It will also stop any casual attempt to view a graphic directly. So yes, it accomplishes what you wanted to do which is great.

                              Someday, someone is going to search the PHP forums looking for information on how to protect their images and I just wanted to make it clear to them that your technique solves many problems but not all of them. You've acknowledged that it doesn't stop automatic scraping so really, we are in complete agreement.

                                etully:
                                i think you were talking to sneakyimp in your last post, would you mind just clarifying the answers to what i asked in my last post, just to clear it up? with regard to the initial way of preventing casual users using htaccess?

                                thanks very much

                                  Deshg, You wrote:

                                  The .htaccess trick will stop the average person from going directly to the image like this: http://www.mysite.com/image.jpg

                                  but then you later said:

                                  If you decide not to hand out the graphics to people who don't provide a referrer, then you will have real users who can't see the graphics on your web site.

                                  I think i am missing something here, but how do i prevent casual users from doing exactly what you said (typing in the direct address and accessing the image) if hiding the referer is a bad idea?

                                  You're not missing anything.

                                  When you call an HTML page,  your browser asks for the HTML first.  When it gets the HTML,  it reads through the HTML and determines what graphics are on the page that it needs to build the page.  So then after it gets the HTML,  it makes a series of additional requests to get the graphics.  When it asks for those graphics,  it passes a value for the referrer.  Therefore,  in theory,  you could say "Apache should not hand out the graphic if there is no referrer - and neither should it hand out the graphic if the domain name in the referrer is anything but my domain name - and neither if the referrer ISN'T MY HTML PAGE."   And this would be pretty good.  If someone were right click on an image and find the URL of the image,  they could try to post that URL in a chat room but when someone pastes that URL into their browser, no referrer value will be sent so Apache refuses to hand out the image which is good.
                                  
                                  The problem is that since you are blocking Apache when there is no referrer,  there can (and will) be times when someone is using their Cell Phone as their web browser,  they are using some old version of Opera,  or who knows what else,  that doesn't support referrer values - it can't possibly send a referrer value if it doesn't even know what they are.  So this legitimate user will not be able to see your page because the images will not appear.
                                  
                                  So the .htaccess solution will work for you as long as you recognize that there might be a few glitches.   sneakyimp's solution is pretty good and solves many problems.  I think his technique would work well for you as well.

                                    1) Each time a page on my site is loaded a unique code is generated which is stored as a session variable
                                    2) Each image tag produces a unique image code by encrypting the image name and the current unique code (in the session variable) together, which is then passed within the img tag to image_feeder.php
                                    3) image_feeder.php checks the unique image code by encrypting the current session code and image name and displays the correct image if the codes match otherwise displays an error image

                                    Yep that's pretty much it. Write one function to generate a hash from the image name and whatever secret code you have stored in session. for my example, i also hashed in the user's IP address and another constant of my own choosing that will not in any shape or form be visible to a user. i'd send you the code but i think i plan to use it elsewhere for a site i'm working on and would therefore like to keep it secret.

                                    but the basic idea is this:
                                    include.php
                                    * contains the hash function that builds a hash from the image filename, the unique id stored in session, the user's ip, and a secret hash seed that will never ever be visible to the user unless they can somehow read your source code.

                                    myfile.php:
                                    includes 'include.php'
                                    creates a unique id and stores it in session
                                    * generates the IMG tags by calling the aforementioned hash function to get a hash value. instead of inking the images directly, you link to image_feeder.php with the image filename and the hash value in the query string

                                    image_feeder.php
                                    includes 'include.php'
                                    retrieves the uid from session
                                    makes sure the user has a non-empty uid value in session vars, a non-empty value for the image filename, and a non-empty value for the hash value. if any of these are empty, outputs the SORRY! image.
                                    calls the aforementioned hash function to verify that the hash value supplied in the query string ($_GET['secret_code']) matches the value returned by the hash function. if they don't match, it outputs the SORRY! image.

                                    the reason you can directly link to an image with firefox after viewing the page is because all open firefox windows share all the same session/cookie data. If a session exists in one window, it exists in all the others. IE tends to compartmentalize its windows sometimes. To what degree? I don't really know.

                                    Also, because I use the user's IP address in my hash function, the link for a single image will differ between two different users--direct image links that might work for you won't work for me because my IP is different. This should pretty effectively prevent folks from hosting images on your site and spamming other people.

                                    This approach also has issues:
                                    performance? if you store session values in little files, performance on a server can really suffer. You might need to write your own session handling routines that use a database.
                                    user agents must support sessions. if they don't, no images! I don't think cookies are strictly required...PHP might be smart enough to propagate necessary session IDs in URLs. But maybe not.
                                    * Other?

                                      a month later

                                      Sneakyimp: Just wanted to say a BIG BIG thank you to you, my apologies i haven't written sooner but the genius' at BT cut off my internet, however i have done pretty much what you described above and it works beautifully. Once i've gone a little further along i might ask you one or two other things(!), but right now it seems like i've got just what i wanted! Thanks very very much for all your help it really really is much appreciated.

                                      Dave

                                        Write a Reply...