limiting word output

vaska

i can think of a bunch of different ways that one could limit the amount of words that are output using mysql/php...but how do people generally do this?

do they use COUNT and LIMIT? or do they SLICE the ARRAYS? or something else?

example...i have an article with 1510 words...but i only want 500 words per webpage so i would split this up and then have the results dynamically create the additional pages...

thanks for any thoughts about this...jv

Weedpacket

One method is to count stretches of contiguous whitespace (preg_match_all('/\s+/', $text)) - that plus 1 is a reasonable estimate of the number of words in the text. Don't forget to trim() the text first.

If it's fewer than 500 words, then you don't need to do any more, so it's reasonable to check first and save yourself unnecessary work 🙂 Only if there are more than 500 words does it sense to work at limiting it. So with that check out of the way, we can happily go ahead and assume that what we're looking at does have more than 500 words.

You can preg_split() on the whitespace (same expression as above), array_slice() out the first 500 words from the array that results, and then join() them again with spaces. The downside of course is that you lose any niceties of whitespace formatting in the string (much as HTML pages don't respect such things either unless you use <pre> tags). If there are paragraph marks they will be lost.

I haven't tried it, but I suspect anything involving preg_match('/^{((\S+\s+){500})/',} $text, $matches) would be a bad idea. On the other hand, you could have

$short_text = '';
for($i=0;$i<500;++$i)
{ preg_match('/^(\S+\s+)(.*)', $text, $matches);
  $short_text .= $matches[1];
  $text = $matches[2];
}
$short_text = trim($short_text);

This would preserve the whitespace.

Something else I haven't tried at all:

$text_bits = preg_split('/\b/', $text);

and split the text on word boundaries. The result would be an array of alternating words and chunks of whitespace (at least, that would in my opinion be the Right Thing for it to do). Join the first 999 or 1000 (I'm not sure if \b would match the very start of the string or not) with "" and whitespace would be preserved, including paragraph marks.

That's just a couple of ideas that sprang to mind when I read the question. Usually my very first ideas aren't the best 🙂

cgraz

I usually use substr() for this. After you query the database and create a variable, apply substr to it to only return part of the string.

This is my favorite way to do it because it's easy, although, I haven't really tried any other methods as this method seems to work great and is super easy. 🙂

Cgraz

Weedpacket

How do you determine the parameters to pass to substr()? It's trivial to truncate something at 500 characters this way, but what about 500 words?

vaska

interesting...this is just amazing...

http://www.iht.com/articles/81971.html

they are using javascript to somehow count the number of words and it AUTOMATICALLY determines how many words should go in each column and also how many pages you need...

try this link...check the page number (in the lower right) when it loads...THEN...make your window smaller and you'll see that both the columns adjust AND the page numbers adjust...

so...using php for this seems useless...except to store the information and to dynamically drive a website...jv

cgraz

Ahh I just skimmed over the post. Must've mistaken words for characters.

Anyways, I searched PHP.net and found a few examples for counting words: http://www.php.net/manual/en/function.str-word-count.php (there are 3 in the user comments section)

Cgraz