Hi,

Does anyone know of a php script that can crawl a site and check if the links on each page (internal or external) are working?

    2 months later

    YOu could always setup a quick regex (even the most complex pages can be crawled for links in less than 1 second) and then use sockets/streams to see what headers are returned (200 is okay, 301 is moved, 400 is bad).

      WeAnswer.IT wrote:

      This is not written in PHP to my knowledge, but it will do what you want. Make sure you configure it before you use it.

      http://validator.w3.org/checklink

      I should have been more clear: the link above is a tool, not a script for you to download and install. Programming your own link checker would not be worth it since W3C has done it for you. All you have to do is go to that page, type in your URL, and then click a button.

        WeAnswer.IT wrote:

        I should have been more clear: the link above is a tool, not a script for you to download and install. Programming your own link checker would not be worth it since W3C has done it for you. All you have to do is go to that page, type in your URL, and then click a button.

        Downside it of course that if you use it to e.g., check your own site for errors that you then manually have to go through the pages, and write down all the links. A personal site crawler would allow you to automated verification of e.g., a list of related sites.
        I am guessing that such an application is the final purpose of the requested script

          Does anyone know of a php script that can crawl a site and check if the links on each page (internal or external) are working?

          We should have been more clear. This is not an ordering site where you can freely get what you want handed to you on a silver platter. There are ways to do it. If you want to put forth the effort of creating a script, then we'll help. If you just want someone to do the google searching for you, you're in the wrong place.

          I've told you how this can be achieved. Either you're willing to put in the time it takes to look up the regular expression functions PHP has to offer (there are only two sets) or you're not. If not, I suggest you go to hotscripts.com or find a free lance PHP developer to have them do this for you. I could probably put this together in about 2 hours worth of work. So it may take a new-comer to php a day or so to get something like this working.

          The basics of it are:

          1.) Get the HTML of the page(s) you're crawling
          2.) Run a regular expression on the page looking for all valid links
          3.) Use some function to validate that the links are in fact valid.

            Write a Reply...