Originally posted by jstarkey
1) a shell script to regexp the email addresses in the mail archive and replace them with <email protected> or similar.
I am very very new to regular expressions... so I made a script with the way I know better. I am posting it here so any other of you guys interested you can just copy&paste it for yourselves.
ok... here we go..
this is a function looking for emails and changing them to the format someone (at) somewhere (dot) com
function replaceEmails($inBody, $ATsign, $DOTsign) {
$outBody="";
$inBodyNextCopyIndex=0;
$inBodyLen=strlen($inBody);
$iAT=-1;
while(true){
if ($iAT+1>$inBodyLen-1) break;
$iAT=strpos($inBody, "@", $iAT+1);
if ($iAT===false) break;
else {
// look for the beginning
$i=$iAT-1;
while($i>=0 && strpos("._-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789", substr($inBody, $i, 1))!==false) $i--;
$i++;
if ($i==$iAT) continue;
else {
if (substr($inBody, $i, 1)==".") continue;
else {
$iStart=$i;
// look for the ending
$i=$iAT+1;
$iDOT=-1;
while($i<$inBodyLen && strpos(".-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789", substr($inBody, $i, 1))!==false){
if (substr($inBody, $i, 1)==".") $iDOT=$i; // look for the final dot
$i++;
}
$i--;
if ($i==$iAT) continue;
else {
if ($iDOT==-1) continue;
elseif (strpos("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ", substr($inBody, $i, 1))===false) continue; // the .com part should always end in a letter
else {
// success - extract+format good parts - continue loop
$outBody.=substr($inBody, $inBodyNextCopyIndex, $iStart-$inBodyNextCopyIndex)."[B]".substr($inBody, $iStart, $iAT-$iStart)." ".$ATsign." ".substr($inBody, $iAT+1, $iDOT-$iAT-1)." ".$DOTsign." ".substr($inBody, $iDOT+1, $i-$iDOT);
$inBodyNextCopyIndex=$i+1;
$iAT=$i;
continue;
}
}
}
}
}
}
return $outBody;
}
now... I also made a transpose of this to include support for the tags vBulleting is using:
function replaceEmailsIgnoreTags($inBody, $ATsign, $DOTsign) {
$outBody="";
$inBodyLower=strtolower($inBody);
$inBodyNextCopyIndex=0;
$inBodyLen=strlen($inBody);
$iAT=-1;
while(true){
if ($iAT+1>$inBodyLen-1) break;
$previous_iAT=$iAT;
// find which occurrence comes first (2147483647 is the max signed integer number)
$iAT=strpos($inBodyLower, "@", $previous_iAT+1); if ($iAT===false) $iAT=2147483647;
$iQuote=strpos($inBodyLower, "[quote]", $previous_iAT+1); if ($iQuote===false) $iQuote=2147483647;
$iPhp=strpos($inBodyLower, "[code=php]", $previous_iAT+1); if ($iPhp===false) $iPhp=2147483647;
$iMin=min($iAT, $iQuote, $iPhp);
if ($iMin==2147483647) {
$outBody.=substr($inBody, $inBodyNextCopyIndex, $inBodyLen-$inBodyNextCopyIndex);
break;
}
elseif ($iMin==$iAT) {
// look for the beginning
$i=$iAT-1;
while($i>=0 && strpos("._-abcdefghijklmnopqrstuvwxyz0123456789", substr($inBodyLower, $i, 1))!==false) $i--;
$i++;
if ($i==$iAT) continue;
else {
if (substr($inBody, $i, 1)==".") continue;
else {
$iStart=$i;
// look for the ending
$i=$iAT+1;
$iDOT=-1;
while($i<$inBodyLen && strpos(".-abcdefghijklmnopqrstuvwxyz0123456789", substr($inBodyLower, $i, 1))!==false){
if (substr($inBody, $i, 1)==".") $iDOT=$i; // look for the final dot
$i++;
}
$i--;
if ($i==$iAT) continue;
else {
if ($iDOT==-1) continue;
elseif (strpos("abcdefghijklmnopqrstuvwxyz", substr($inBodyLower, $i, 1))===false) continue; // the .com part should always end in a letter
else {
// success - extract+format good parts - continue loop
$outBody.=substr($inBody, $inBodyNextCopyIndex, $iStart-$inBodyNextCopyIndex)."[B]".substr($inBody, $iStart, $iAT-$iStart)." ".$ATsign." ".substr($inBody, $iAT+1, $iDOT-$iAT-1)." ".$DOTsign." ".substr($inBody, $iDOT+1, $i-$iDOT)."[/B]";
$inBodyNextCopyIndex=$i+1;
$iAT=$i;
}
}
}
}
}
else {
if ($iMin==$iQuote) $st="[/quote]";
elseif ($iMin==$iPhp) $st="
";
else die("young"); // this should never happen
$i=strpos($inBodyLower, $st);
$len=strlen($st);
if ($i===false) {
// did not find closing tag ... returning rest of text as is
$outBody.=substr($inBody, $inBodyNextCopyIndex, $inBodyLen-$inBodyNextCopyIndex);
break;
} else {
// return text until the closing tab as is. continue computations with the rest
$outBody.=substr($inBody, $inBodyNextCopyIndex, $i-$inBodyNextCopyIndex+$len);
$inBodyNextCopyIndex=$i+$len;
$iAT=$i+$len-1;
}
}
}
return $outBody;
}
[/code]
I haven't given it much testing yet... but I usually make only typing errors π
when I apply this little test:
$st="this is just [quote] a [email]b@c.com[/email] sample\r\ntext to see [/quote]how good\r\ni have [email]_alex@some.where.gr[/email] become during the last [email]34j@php.net[/email]\r\n year in php progr@mming";
echo "<p>".nl2br(htmlspecialchars(replaceEmailsIgnoreTags($st, "(at)", "(dot)")))."</p>\n";
I have this result:
this is just
how good
i have _alex (at) some.where (dot) gr become during the last 34j (at) php (dot) net
year in php progr@mming
It looks fine to me... if any of you test it and find any bugs please msg me!!