How can I somewhat efficiently tokenize or split a string into sentences?
For example, let's take this paragraph:
This is a sentence. And here is another sentence. I am proud to bring you, the reader, another eloquent sentence. Ahh, but alas, I will throw in an abbreviation that reads M.B. and end this sentence.
I'd like to come up with some code that will tokenize the paragraph into sentences, not words. So the end result will be something like this:
$paragraph[0] = "This is a sentence.";
$paragraph[1] = "And here is another sentence.";
$paragraph[2] = "I am proud to bring you, the reader, another eloquent sentence.";
$paragraph[3] = "Ahh, but alas, I will throw in an abbreviation that reads M.B. and end this sentence."
How would you go about coding this? Notice that you can't just tokenize the initial string by periods. I suppose you could assume that a new sentence will either follow a period with a capital letter or follow a newline.