Hi,
I have a file with a bunch of SQL statements in them along with comments, and some code. All (guaranteed) of the sql statements have

SELECT .... FROM

Therefore - I wanted to extract them out with a regular expression. I wrote 30+ lines of code using strpos and substr - but it's so messy

Can somebody help me with the correct regex that can extract out all occurances of 'SELECT ... FROM' where, of course the dots or periods can be any number of characters or words

Thanks,
Vmusic

    @SELECT([.\n]*)FROM([.\n]*)@i

    Use that with preg_match_all

    If it doesn't work, use it as a guide:
    . (period): any non-breaking character
    \n: new-line character
    []: delimits a range of characters
    *: delimits that 0 or multiple instances of that range are possible
    b[/b]: delimits match area. When matches returned to array, these will be returned.
    @: Pattern delimiters. Marks beginning and ending of pattern
    i: Defines a case insensitive search

    If you don't understand this, google for RegEx +PHP and read one of the many articles/tutorials on what to specify.

      bpat1434 wrote:
      [.\n]*

      As far as I remember the dot loses its special meaning in character classes, so I'd suggest using .* in conjunction with the s-modifier instead. It may also make sense to use ungreedy matching (.*? or modifier U).

        the dot doesn't lose it's special meaning... but it is greedy. A more specific one could be:

        @SELECT([^\bFROM\b]*)FROM([^\bLIMIT\b]*)[\bLIMIT\b]?[\d\,\s]?@i

        A little less greedy, and still not guaranteed to work, and uses Negating clauses instead of the dot.

          Hi,
          I appreciate the help... but I can't get those to work

          so I want an array of all the select strings in a file. OK, I loaded the file into a variable. I though the array was the third parameter of the regex or preg_match function

          ~HELP~~~~

          How do I get this array?

          Thanks
          Vmusic

            Not sure why you'd need 30 lines of messy code with strpos()... wouldn't this do the basic trick?

            $all = file( 'myfile.txt' );
            $selects = array();
            foreach( $all as $line )
            	if( strpos( $line, 'SELECT' ) !== false && strpos( $line, 'FROM' ) !== false )
            		$selects[] = $line;
            

            But... here is an (untested) regexp that should fetch from each SELECT to the end of the line into $selects[0]:

            $text = file_get_contents( 'myfile.txt' );
            $selects = array();
            $count = preg_match_all( "/SELECT\s+.*?FROM.*?$/", $text, $selects );
            

            This all assumes your queries do not span multiple lines... things get more complicated if they do.

              This works for me:
              php RegEx File

              <?php
              
              $contents = file_get_contents('sample.sql');
              echo '<!--'."\n".$contents."\n-->";
              
              $pattern = "@(SELECT[\w\*\.\,\n\`\s]*(FROM[\w\`\,\n\.\s=]*)?(WHERE[\w\.\,\-=\(\)\"\'\n\s\%]*)?(GROUP BY[\w\`\.\-=\(\)\n\s]*[ASC|DESC]?)?(HAVING[\w\.=\n\s]*)?(ORDER BY[\w\.\-\(\)\'\"\n\s]*[ASC|DESC]?)?(LIMIT[\w\,\s]*)?\;)@is";
              
              preg_match_all($pattern, $contents, $matches);
              array_shift($matches);
              
              echo '<pre>';
              var_dump($matches);
              echo '</pre>';
              ?>

              sql file (for testing)

              SELECT * 
              FROM `table1` 
              ORDER BY column ASC 
              LIMIT 5;
              
              SELECT a.*, b.* 
              FROM `table` a 
              INNER JOIN `tbl` b 
              	ON a.id=b.bid 
              WHERE a.title LIKE '%e%' 
              ORDER BY a.salutation DESC;
              SELECT * FROM `table1` 
              WHERE column=1 
                AND column2 LIKE '%a%' 
              ORDER BY column ASC LIMIT 5;

                Correction, the dot does lose its special meaning in a character class.

                <?php
                if (preg_match('/[.]/', 'test', $m)) {
                	print "OK [$m[0]]\n";
                } else {
                	print 0;
                }
                ?>

                Output: 0

                You wanted (.|\n) not [.\n].

                  May I suggest the s-modifier again?

                    Yes, that is the best way. I just figured I'd make a correction... 😉

                      xblue wrote:

                      May I suggest the s-modifier again?

                      If you look closely, I did use the s modifier... I also used the i modifier....

                        Write a Reply...