I am having a (really) hard time crafting a regular expression to achieve what seemed to be relatively straightforward task: splitting a string into two pieces using a period as the delimiter, subject to some conditions (specifically to NOT split URLs, email addresses or real numbers).
I want to split substrings of the form "before abc.def after", "before a12.b3c after", extracting the components from the larger string (yielding "abc", "def" and "a12" and "b3c", for example. The substrings I will be testing are uniform in that they are to contain only a single period, and the portions I want to extract are bounded by whitespace or start/end characters (although probably any "\W" would also be allright.
I do NOT want strings of the following types to be extracted/split: "before email123@host.com after", "before email@host123.com after", "email.more@subhost.host.com", "server.host.domain", "123.456".
My PHP code to extract these substrings is:
$string = preg_replace("/(?=([\s][a-zA-Z_]\w*)\.\w+[\s$])(?=\1)\./"," ",$string);
The idea behind this regex is to test the string for conformance, then to move the pointer up to the decimal point and use that as the split trigger point.
After hours and hours and hours of tweaking, I have been unable to get this or anything close to it to work.
I would very much appreciate if anyone could offer insight into what I might be doing wrong. Thanks!
Chris