If you just want to find data corruption then set up a spider that visits every page of the site (either by working from a text file list of pages or my parsing the html and following your links). You'll probably want to use [man]curl[/man] for this.
After you retrieve the page you can take the [man]md5[/man] of it and store that on your computer.
Then if ever on subsequent checks the md5s don't match up, you've got a change. At which point the program should prompt you with the page name and ask if the change is expected. If the change is expected then it would update your locally stored md5 string.
However, if the change is unexpected you do not know where the problem occured. To know if the problem is on your system or the server you'll need to have store a known good copy of your md5s on removable media. Once the bad md5 is found you can then pop in your knwon good md5s disk and compare the one on your computer to the known good one. If they match the problem is on the server, if they do not match the problem is on your hard drive.
For the really fun part you could make it automatically up/download a good copy when a problem is found.