Is this possible to match? I am extremely green when it comes to regular expressions. Also, any suggestions of a good book?


$filenames = array('Thumb_80073.jpg', 'thumb_11_Thumb_80077.jpg', 'Thumb_abs_8999.jpg'); //The array is a list of file names the only important information is the last part of the name ie. 80077.jpg

$output = preg_match(); // I have no clue how to express that I want to capture everything from the last "_"  to the last "g"

print_r ($output);



    What's the source of the data? Do you have an array of file names and you're trying to pull out a certain part of them? Is this actually a directory (and/or SQL table) we're talking about? Do you want an array with the information, or would you rather just process the data in a loop as it is extracted (e.g. get the data from $filenames[0], do something with it, continue on for rest of entries)?

    In this case, you could either use regular expressions or simple string manipulation functions. In either case, we need to know some answers to the above questions to know the best way of going about this.

    bluto374 wrote:

    I am extremely green when it comes to regular expressions. Also, any suggestions of a good book?

    I personally have never bought/read a book on PHP/SQL because I find that web trends change while printed text can not (unless you're Harry Potter and I'm just missing out).

    As such, I prefer to find good tutorials/articles online that deal with the subject matter I'm trying to learn. As for regular expressions, I can recommend one good site that has some tutorials and a lot of examples/information: Regular-Expressions.info.

      bluto374 wrote:

      Is this possible to match?

      Probably, but what exactly are you trying to match? In my opinion, 'capture everything from the last "_" to the last "g"' is not very precise. Rather, elaborate on the expected format of the data (e.g., say things like "one or more digits"), and what exactly you want to match. Provide examples. Consider also if the data might not be of the expected format.

      Once you are able to articulate what exactly it is that you are trying to match, it may well be the case that you just need to refer to a reference on regular expressions, and you will be able to come up with a suitable pattern yourself 🙂

      bluto374 wrote:

      Also, any suggestions of a good book?

      You can find plenty of references and tutorials about regular expressions online. For example, the PHP manual has a section dedicated to the topic.

        I have a .csv file that I am reformatting, below is the code that reads the file and rearranges the columns. The problem is $line[6] and $line[7] have extra information that I don't need, all I need is the filename, which is for instance Thumb_80073.jpg should just be 80073.jpg. I don't know why the manufacturer does the file this way, but they do and it is annoying. The code is used to reformat the file to a format that I can use to import into my shopping cart.

        $in = fopen("datafeed.csv","r");
        $out = fopen("converted.csv" , "w");
        fgets($in);
        fgets($in);
        $header_row = array('Name', 'Model', 'Product Price', 'Quantity', 'Categories', 'Description' , 'Product Image', 'Large Image', ' Weight', 'Vendors Price', 'Vendors Product Id', 'Wholesale', 'Special Products Price', 'Product Attributes', 'In Stock');
        fputcsv($out, $header_row, ',');
        while(($line = fgetcsv($in, 0, ',')) !== false)
        {
        	if($line[4] > 0)
        	{
        	$stock = 1;
        	}
        	else
        	{
        	$stock = 0;
        	}
        	if($line[3] == 0 )
           	{
            $output = str_replace('0', " " ,$line[3]);
        	$formatted = array($line[8], $line[1] ,ceil(($line[2]/0.4)) , $line[4] , str_replace(';', ';;', $line[6]) , $line[8] , $line[9] , $line[10] , $line[5] , $line[2] , $line[1] , ceil(($line[2]/0.7)) ,$output , $line[7], 					    $stock);
        	fputcsv($out, $formatted, ','); // change '|' to ',' if you want comma delimiter
           	}
            if($line[3] != 0 and $line[4] > 0 )
        	{
        	$formatted = array($line[8], $line[1] ,ceil(($line[2]/0.4)) , $line[4] , str_replace(';', ';;', $line[6]) , $line[8] , $line[9] , $line[10] , $line[5] , $line[2], $line[1] , ceil(($line[2]/0.7)), $line[3]   , $line[7], 		    $stock);
        	fputcsv($out, $formatted, ','); // change '|' to ',' if you want comma delimiter
           	}
        
        
        }
        

          Well, based on your description, and assuming all of the file names will have at least an underscore before the numbered section (and that the last part is strictly numeric), you could do something like this for a pattern:

          /.*_([0-9]+\.jpe?g)$/i

          The webpage I linked to above does a great job explaining the various uses of regular expressions, but I'll break down the pattern I suggested so you can get an idea of how regexp's work:

          • /

            The first character in a PCRE-style regexp (this is the style that all preg_*() functions use, unlike the older, deprecated ereg*() functions) is the pattern delimiter; this is how the regexp parser knows where your pattern will start and stop. The forward slash is a common choice, but you can pick pretty much any character you like.

            If a forward slash appears in the pattern you write, you can either a) escape the forward slash with a backslash, so that the parser doesn't mistakenly think you're trying to end the pattern, or b) pick a different delimiter altogether! Some other common choices would be '@', '#', etc.

          • .*

            The '.' character matches anything (except for new line characters... unless the 's' modifier - a.k.a. the dot-all modifier - has been added) and the '' means "zero or more of (whatever came before the asterisk)."

          • _

            Nothing exciting here... just a literal underscore.

          • (

            This next part is enclosed in parenthesis, because this actually contains the information you want. If you surround part of the pattern with parenthesis, it will be extracted as a "subpattern" as the PHP manual calls it. In other words, this data will be stored in a separate array inside the $matches parameter of [man]preg_match/man.

          • [0-9]+

            This is a character class. Basically, it's a list of characters that you want to match, and a hyphen between any two characters will create a range. The '+' is similar to the asterisk, except that it means "match one or more of (whatever came before the plus sign)."

          • \.

            Remember how I said the '.' will match any character? Since we don't want that, it must be escaped with a backslash so that the regexp parser knows that you want a literal period there.

          • jpe?g

            I included 'e?' in this pattern because ".jpg" and ".jpeg" are both valid file extensions for JPEG images; if you aren't concerned with the latter of the two, you could remove these two characters from the pattern. The '?' will match "one or zero of (whatever comes before the question mark)."

          • )$

            The ')' simply closes the '(' and marks the end of the "subpattern" that we want to extract.

            The '$' indicates an end-of-string marker. Basically, we're saying that ".jpg" (or ".jpeg") should be the last thing it sees in the string in order for the string to match this pattern.

          • /i

            The '/' marks the end of the regexp pattern, since it was used as the first character of the pattern (and thus became the delimiter). Anything after the closing delimiter is a "modifier" - characters that change the way the pattern behaves. In this case, I used the 'i' modifier - or "case-insensitive" modifier - since regexp patterns are by default case-sensitive.

          Sorry this post ended up being so long... it's a rainy Sunday morning here and I have nothing better to do than ignore boring chores and write up long-winded explanations on PHPBuilder. :p

            Write a Reply...