That aint easy.
PDF converts the documents to virtually bits and pieces.
Take a look at a pdf document in a txt client and you will see what I mean, worse is that the different methods of encoding a pdf might give different "code" in the document.
No answer you would like to have, but an honest one.