PHP Preg_Match Question

jakeruston · Oct 29, 2009

Hi,

I'm trying to make a preg_match which can find the URL in a string. For example, the string is:

This is a test. A URL is http://www.google.co.uk/hello/hello

And it returns http://www.google.co.uk/hello/hello

The URL will be the same everytime. This is what I've got so far:

"(?:[url]http://)blahblahblah.com/show/([/url][0-9]+)$"

There will be numbers at the end of the URL. Unfortunately, this doesn't work.

Any ideas?

theelectricwiz · Nov 29, 2009

preg_match ( "(http:\/\/blahblahblah.com\/show\/[0-9]+)", $blah, $matches);
echo $match[0];

johanafm · Nov 29, 2009

Both of you are lacking pattern delimiters, even though electricwiz's pattern escapes as if using / as delimiter. I'd remove all \ from his pattern, and put a # as first and last character.

But since you claim that the address will be the same every time, just make sure that it is

if (strpos($url, 'http://blahblahblah.com/show/') === 0)

nrg_alpha · Nov 30, 2009

johanafm;10935140 wrote:
Both of you are lacking pattern delimiters...

Actually, electricwiz's pattern is using perfectly acceptable and valid delimiters. Delimiters can include matching opening and closing punctuation such as (...) or <...> or [...] (granted, I strongly ill-advise using such characters as delimiters).

As for your solution, it will find the base url, but I suspect jakeruston wants to match the whole url (numbers at the end as well) and do something with it? At least, that's how I understand it.

@
One possible solution could be as such:

$blah = 'This is a test. A URL is http://blahblahblah.com/show/7895';
preg_match('#http://blahblahblah\.com/show/\d+#', $blah, $matches);
echo $matches[0];

Granted, in your initial example, I am noticing you made use of non capturing grouping for http:// via (?:[url]http://)[/url] and have used capture grouping for the numbers... what are you trying to do? Simply get the url without http:// and also the numbers at the end? If so, the pattern could simply be:

'#http://\Kblahblahblah.com/show/(\d+)#'

In this case, $match[0] = blahblahblah.com/show/7895 while $match[1] = 7895
If the goal is to have blahblahblah.com/show/ as one match, and 7895 as the other, then you could go with:

'#http://(blahblahblah.com/show/)(\d+)#'

This results in $match[1] = blahblahblah.com/show/ and $match[2]= 7895

Otherwise, perhaps explaining what you are trying to attempt will clarify things.

johanafm · Nov 30, 2009

nrg_alpha;10935207 wrote:
Actually, electricwiz's pattern is using perfectly acceptable and valid delimiters. Delimiters can include matching opening and closing punctuation such as (...) or <...> or [...] (granted, I strongly ill-advise using such characters as delimiters).

Thanks. I had no idea that was acceptable. I would rather have expected "(pattern(" to be correct, and missinterpreted it as the whole pattern being a capturing subpattern.

nrg_alpha · Nov 30, 2009

johanafm;10935217 wrote:
Thanks. I had no idea that was acceptable. I would rather have expected "(pattern(" to be correct, and missinterpreted it as the whole pattern being a capturing subpattern.

Yeah, it's admittedly an oddball. But the pcre introduction aspect of the manual does confirm this:

excerpt:

The expression must be enclosed in the delimiters, a forward slash (/), for example. Delimiters can be any non-alphanumeric, non-whitespace ASCII character except the backslash () and the null byte. If the delimiter character has to be used in the expression itself, it needs to be escaped by backslash. Since PHP 4.0.4, you can also use Perl-style (), {}, [], and <> matching delimiters.

However, there is a crux in the manual's explaination.. it states that if a delimiter character is used within the pattern, it must be escaped. Generally, this is true, but not necessarily so in absolute terms (it's contextual - more on this below).

I found out that if you choose a delimiter set like say < and > for example, and you want to match a literal < and > within the pattern, so long as there is an equal amount of opening closing characters within the pattern, they don't require escaping (oddly enough):

Example 1:

$str = 'Some text <35723a7c4b> more text!';
preg_match('<Some text <[a-c0-9]+>>', $str, $match); // no need to escape inner < and > characters
echo $match[0]; // Ouput: Some text <35723a7c4b>

Example 2:

$str = 'Some text <35723a7c4b> more text <4234654656>!';
preg_match('<Some text <[a-c0-9]+> more text <[0-9]+>>', $str, $match); // still don't need to escape multiple inner < and > matching characters
echo $match[0]; // Ouput: Some text <35723a7c4b> more text <4234654656>

Finally, for using characters like (..) (groups) or [..] (character classes) within the pattern (when the delimiters are also those characters), we don't escape those if we want the regex engine to treat those as actual groups or classes:

Example:

$str = 'Some text 35723a7c4b more text!';
preg_match('[Some text ([a-c0-9]+)]', $str, $match); // chose [ and ] as delimiters, yet character class still parses correctly
echo $match[1]; // Ouput: 35723a7c4b

But yeah, on the whole, I certainly wouldn't recommend using these oddball delimiters.. will mostly lead to confusion to the uninitiated. I would stick to delimiters like !..!, ~...~, #...#, etc.. In those cases (when delimiters are not matching opening / closing characters, they do need to be escaped within the pattern).

PHP Preg_Match Question

Jjakeruston

Ttheelectricwiz

Jjohanafm

Nnrg_alpha

Jjohanafm

Nnrg_alpha