preg_match() help needed from the regex buddies

bernard_hinault

hi friends,

so i have some lessions in php: I am on the way to do some regex in order to parse a html sourcecode from pages like this;:
http://www.phpbb.com/phpBB/viewtopic.php?p=2393978#2393978

but i begin with the beginning; i have to do some hard jobs on phpBB and i need to work with reg Expressions of Perl. Regex are a good way to help to filter out all the unnecessary parts of a page source.

so - lets start - ineed the regex of the following parts

[ this i have taken from the list http://www.phpbbdoctor.com/doc_tables.php ]

phpbb_categories Categories
phpbb_forums Forums for your board.
phpbb_groups A group of users
phpbb_posts Posts for your board.
phpbb_posts_text The text for the post
phpbb_topics Topics for your board.
phpbb_users Base user information, preference settings, and so on.

and the timestamp. i need the timestamp too. So we have a nice little set of data, that we are after:

i have greated a regex that should fit the needs: what do you think? Besides the translation, i guess that we can solve the problems with this regex!?

see here the little regex:

Code: /<td width="\d+" align="\w+" valign="\w+" class=".*"><span class="name"><a name="\d+"></a><b>(.*)</b></span><br /><span class="postdetails">Mitglied<br /><br /><br />Anmeldedatum: (.*)<br />Beiträge: (.*)<br />/

Subpattern 1 give the Username, Subpattern 2 the register-timestamp of the Users
pattern 3 gives back the number of posting.

Code: /\s+<td width="100%"><a href="viewtopic.php?p=\d+#\d+"><img src=".*" 
width="\d+" height="\d+" alt="Beitrag" title="Beitrag" border="\d+" /></a><span class="postdetails">Verfasst am: (.*)<span class="gen">&nbsp;</span>&nbsp; &nbsp;Titel: (.*)</span></td>/

Subpattern 1 gives time of composing Nr2 gives the title

what about Metacharacters & the others they have to get tracked down with Backslash

^ $ + ? . * ( ) [ ] { } / \ |

BTW; HTML/PHP Code has to be taken away - how look forward to hear from you [/CODE]

And now i found this little help and code - what do you think about this code here - thake form the php-developersite. Can this help us here - to get what we need.

=http://de2.php.net/manual/en/function.preg-match-all.php

 Example 2. Find matching HTML tags (greedy)
<?php
// The \\2 is an example of backreferencing. This tells pcre that
// it must match the second set of parentheses in the regular expression
// itself, which would be the ([\w]+) in this case. The extra backslash is
// required because the string is in double quotes.
$html = "<b>bold text</b><a href=howdy.html>click me</a>";

preg_match_all("/(<([\w]+)[^>]*>)(.*)(<\/\\2>)/", $html, $matches, PREG_SET_ORDER);

foreach ($matches as $val) {
   echo "matched: " . $val[0] . "\n";
   echo "part 1: " . $val[1] . "\n";
   echo "part 2: " . $val[3] . "\n";
   echo "part 3: " . $val[4] . "\n\n";
}
?>

This example will produce:

matched: <b>bold text</b>
part 1: <b>
part 2: bold text
part 3: </b>

matched: <a href=howdy.html>click me</a>
part 1: <a href=howdy.html>
part 2: click me
part 3: </a>

what is aimed: I am on the way to do some regex in order to parse a html sourcecode from pages like this;: http://www.phpbb.com/phpBB/viewtopic.php?p=2393978#2393978

See also preg_match(), preg_replace(), and preg_split().
http://de2.php.net/manual/en/function.preg-match-all.php

well after wards the code should go in to a database.

what do you think? Pleas let me know!
ths

bernard