Hello,
I a novice PHP'er and I am trying to find all matches of a short piece of DNA in a longer piece. The catch is, the data I want returned is the query AND a given number of bases following the match.
E.g. in my DNA sequence is
aaccaccttttgggtttcctttgggaaatttttt
and I want to find all of the "cc" AND the following 6 nucleotides I should get
ccaccttt
ccttttgg
cctttggg
The code I'm trying is:
<?php
$input = "aaccaccttttgggtttcctttgggaaattttttt";
$search_string = "/cc.{6}/";
preg_match_all($search_string, $input, $possible_matches);
$total = count($possible_matches[0]);
echo "Total number of possible matches = ".$total."<P>";
#This returns the 8 character string cc......
for ($i = 0; $i < $total; $i++) {
echo "Match ".$i."
".$possible_matches[0][$i].
"<P>";}
$search_string = "/cc/";
preg_match_all($search_string, $input, $possible_matches);
$total = count($possible_matches[0]);
#This gives the total # of cc's in the DNA
echo "There are really = ".$total." matches for cc in this DNA<P>";
for ($i = 0; $i < $total; $i++) {
echo "Match ".$i."
".$possible_matches[0][$i].
"<P>";}
?>
Clearly preg_match_all finds the string, then begins looking for the next match AFTER the string it just found. This is even more problematic with something like aacccttttttt
where there are clearly 2 instances of cc, but preg_match_all("/cc/", $input, $possible_matches);
finds only th first.
Any advice on a different function, or way to approach this problem would be appreciated.
Thanks,
Chris