the problem with those document formats is that, pdf is not stored in ascii text, its a binary format, same with most word documents, and rtf has a lot of formatting stuff you woudl have to get rid of. there are libraries for reading pdfs, and the microsoft office 10 comes with a com object you can use to parse word files, but that requires php be running on windows, and well, rtf could be done, you'd just need to figure how to parse out all their formatting tags.