hi friends,
so i have some lessions in php: I am on the way to do some regex in order to parse a html sourcecode from pages like this;:
http://www.phpbb.com/phpBB/viewtopic.php?p=2393978#2393978
but i begin with the beginning; i have to do some hard jobs on phpBB and i need to work with reg Expressions of Perl. Regex are a good way to help to filter out all the unnecessary parts of a page source.
so - lets start - ineed the regex of the following parts
[ this i have taken from the list http://www.phpbbdoctor.com/doc_tables.php ]
phpbb_categories Categories
phpbb_forums Forums for your board.
phpbb_groups A group of users
phpbb_posts Posts for your board.
phpbb_posts_text The text for the post
phpbb_topics Topics for your board.
phpbb_users Base user information, preference settings, and so on.
and the timestamp. i need the timestamp too. So we have a nice little set of data, that we are after:
i have greated a regex that should fit the needs: what do you think? Besides the translation, i guess that we can solve the problems with this regex!?
see here the little regex:
Code: /<td width="\d+" align="\w+" valign="\w+" class=".*"><span class="name"><a name="\d+"></a><b>(.*)</b></span><br /><span class="postdetails">Mitglied<br /><br /><br />Anmeldedatum: (.*)<br />Beiträge: (.*)<br />/
Subpattern 1 give the Username, Subpattern 2 the register-timestamp of the Users
pattern 3 gives back the number of posting.
Code: /\s+<td width="100%"><a href="viewtopic.php?p=\d+#\d+"><img src=".*"
width="\d+" height="\d+" alt="Beitrag" title="Beitrag" border="\d+" /></a><span class="postdetails">Verfasst am: (.*)<span class="gen"> </span> Titel: (.*)</span></td>/
Subpattern 1 gives time of composing Nr2 gives the title
what about Metacharacters & the others they have to get tracked down with Backslash
^ $ + ? . * ( ) [ ] { } / \ |
BTW; HTML/PHP Code has to be taken away - how look forward to hear from you [/CODE]
And now i found this little help and code - what do you think about this code here - thake form the php-developersite. Can this help us here - to get what we need.
=http://de2.php.net/manual/en/function.preg-match-all.php
Example 2. Find matching HTML tags (greedy)
<?php
// The \\2 is an example of backreferencing. This tells pcre that
// it must match the second set of parentheses in the regular expression
// itself, which would be the ([\w]+) in this case. The extra backslash is
// required because the string is in double quotes.
$html = "<b>bold text</b><a href=howdy.html>click me</a>";
preg_match_all("/(<([\w]+)[^>]*>)(.*)(<\/\\2>)/", $html, $matches, PREG_SET_ORDER);
foreach ($matches as $val) {
echo "matched: " . $val[0] . "\n";
echo "part 1: " . $val[1] . "\n";
echo "part 2: " . $val[3] . "\n";
echo "part 3: " . $val[4] . "\n\n";
}
?>
This example will produce:
matched: <b>bold text</b>
part 1: <b>
part 2: bold text
part 3: </b>
matched: <a href=howdy.html>click me</a>
part 1: <a href=howdy.html>
part 2: click me
part 3: </a>
what is aimed: I am on the way to do some regex in order to parse a html sourcecode from pages like this;: http://www.phpbb.com/phpBB/viewtopic.php?p=2393978#2393978
See also preg_match(), preg_replace(), and preg_split().
http://de2.php.net/manual/en/function.preg-match-all.php
well after wards the code should go in to a database.
what do you think? Pleas let me know!
ths
bernard