Howto get the string if the position is known

starbbs · Jan 19, 2005

Originally posted by Weedpacket
I think what is being asked for is "I have a string of words with spaces between them. What is the word that contains the 18th (or whatever) character?" The answer to that could be found with strpos() (using 18 as the offset parameter), strrpos() (using 18 as the offset parameter since PHP5) and substr().

The question I have is "what if the 18th character is a space? Or punctuation?"

This is what i meant. I want the word to return when i know the position. If the 18th char is a space or something else, it counts back until it finds a word

laserlight · Jan 19, 2005

I really shouldnt be writing the code for you, but I wanted to give it a try (itchy fingers), so here's my take on it:

function parseForWord($str, $pos, $delim = ' ') {
	$len = strlen($str);
	//perform bounds checking
	if ($pos >= 0 && $pos < $len) {
		//is character at $pos is a delimiter?
		if ($str{$pos} != $delim) {
			//$pos marks a character within a word
			$word_len = 1;
			//scan left for start of word
			for ($start = $pos - 1; $start >= 0; $start--) {
				if ($str{$start} != $delim) {
					$word_len++;
				} else {
					break;
				}
			}
			//scan right to end of word
			for ($i = $pos + 1; $i < $len; $i++) {
				if ($str{$i} != $delim) {
					$word_len++;
				} else {
					break;
				}
			}
			//$start + 1 here because of $start-- or an out-of-bounds $start
			return substr($str, $start + 1, $word_len);
		} else {
			//$pos marks position of a delimiter
			$word_len = 0;
			//scan left only
			for ($i = $pos - 1; $i >= 0; $i--) {
				if ($str{$i} != $delim) {
					$word_len = 1;
					break;
				}
			}
			//$word_len = 1 > 0 if (last character of) word found
			if ($word_len > 0) {
				//scan left for start of word
				for ($start = $i - 1; $start >= 0; $start--) {
					if ($str{$start} != $delim) {
						$word_len++;
					} else {
						break;
					}
				}
				return substr($str, $start + 1, $word_len);
			} else {
				return '';
			}
		}
	} else {
		return false;
	}
}

Note that the position of a character within the string here is numbered from 0.

starbbs · Jan 19, 2005

Whoooooaaaaa what a function.... i am really really impressed. I will try to study en learn from it ! is there really no simple solution to my request ? like a backwords strpos ?

I did find this... but sometimes it fails. Your function also returns a string if the position is in a middle of a string !! great !

Take a look at this one:

print array_pop(explode(' ',substr($text,0,$positie)));

Weedpacket · Jan 20, 2005

Originally posted by starbbs
like a backwords strpos ?

[man]strpos[/man]
See also: [man]strrpos[/man]

laserlight · Jan 20, 2005

is there really no simple solution to my request ?

Maybe, e.g. with regex, but I'm not sure how to go about constructing one that fits the bill.
The complexity lies in the way you define words, and deal with them when the position is actually a delimiter's position.

like a backwords strpos ?

I think you mean the reverse function of strpos() rather than reversed strpos().

starbbs · Jan 20, 2005

Well laserlight.... i am really greatfull for your help. I did not realise that this request isn´t all that simple.

But did you take a look at my example ±
print array_pop(explode(' ',substr($text,0,$positie)));

laserlight · Jan 20, 2005

print array_pop(explode(' ',substr($text,0,$positie)));

At first glance, this might be ok, though you'll have problems with multiple delimiters.

The code example I gave you was originally coded by defining words as alphanumeric strings, so ctype_alnum() was used to test instead of a direct comparison with the delimiter.
This allows you to work with more than one delimiter with just a small modification of the function.

Weedpacket · Jan 20, 2005

Obviously, the real code is the content of the second loop. The loops are only there as test harnesses to make sure that the right words are returned for every possible value of $pos.

$sentence = "Dit is een simpele text die nergens op slaat. I'd write this sentence in Dutch if I knew any Dutch."; 

// Okay, let's test this.
for($pos = 0; $pos<strlen($sentence); ++$pos)
	echo $pos,' ',array_pop(explode(' ',substr($sentence,0,$pos))),"\n";

// Not what is desired, I take on.

// Regexps? Okay, let's have a crack with them.

for($pos=0; $pos<strlen($sentence); ++$pos)
{
	// Break the sentence into to pieces at $pos. The word we want will be the 
	// last one on the left. If $pos was inside the word at the time, a fragment
	// of the word will end up on the right.
	$left = substr($sentence, 0, $pos);
	$right = substr($sentence, $pos);
	// Isolate the last word to the left of $pos and the first word to the right of $pos.
	// There may be some whitespace after the last word, and there may be some before the first.
	preg_match('/(\\S*)(\\s*)$/', $left, $last_word_and_space);
	preg_match('/^(\\s*)(\\S*)/', $right, $space_and_first_word);
	// We don't need $junk - it's the entire string matched - word and space and all.
	list($junk, $last_word, $space_on_left) = $last_word_and_space;
	list($junk, $space_on_right, $first_word) = $space_and_first_word;
	if($space_on_left=='' && $space_on_right=='')
	{
		// $pos lay inside a word, which got bro
		// ken into two pieces.
		$word = $last_word.$first_word;
	}
	elseif($space_on_right!='')
	{
		// $pos was in whitespace
		//  at the end of a word
		$word = $last_word;
	}
	elseif($space_on_left!='') // Don't really need to test this
	{
		// $pos was positioned 
		// at the start of the word
		$word = $first_word;
	}
	echo $pos,' ',$word,"\n";
}

(Edit: just noticed vBulletin had turned my \s's and \S's into s's and S's. That's not right.)

starbbs · Jan 21, 2005

Thanks for this one... but i really have to make this last code work cause it does not returns the found word left at the position i know/want ?

SO what does this one ?

starbbs · Jan 21, 2005

function parseForWord($str, $pos, $delim = ' ')

I also saw that this one also return words like:

name,

You see the , at the end ? i thoight it tests this to filter this out ?

BlackenedSky · Jan 22, 2005

what about doing a search for the previous delimiter, then using teh mid function to return the word using the two positions?

laserlight · Jan 22, 2005

You see the , at the end ? i thoight it tests this to filter this out ?

It doesnt, because you define the delimiter as a space.

It does for my own version, because I defined words as alphanumeric strings, rather than as non-delimiters.

starbbs · Jan 22, 2005

SO you wrote a simlira unction for your self ? cn you post this ?

laserlight · Jan 22, 2005

function parseForWord($str, $pos) {
    $len = strlen($str);
    //perform bounds checking
    if ($pos >= 0 && $pos < $len) {
        //is character at $pos is a delimiter?
        if (ctype_alnum($str{$pos})) {
            //$pos marks a character within a word
            $word_len = 1;
            //scan left for start of word
            for ($start = $pos - 1; $start >= 0; $start--) {
                if (ctype_alnum($str{$start})) {
                    $word_len++;
                } else {
                    break;
                }
            }
            //scan right to end of word
            for ($i = $pos + 1; $i < $len; $i++) {
                if (ctype_alnum($str{$i})) {
                    $word_len++;
                } else {
                    break;
                }
            }
            //$start + 1 here because of $start-- or an out-of-bounds $start
            return substr($str, $start + 1, $word_len);
        } else {
            //$pos marks position of a delimiter
            $word_len = 0;
            //scan left only
            for ($i = $pos - 1; $i >= 0; $i--) {
                if (ctype_alnum($str{$i})) {
                    $word_len = 1;
                    break;
                }
            }
            //$word_len = 1 > 0 if (last character of) word found
            if ($word_len > 0) {
                //scan left for start of word
                for ($start = $i - 1; $start >= 0; $start--) {
                    if (ctype_alnum($str{$start})) {
                        $word_len++;
                    } else {
                        break;
                    }
                }
                return substr($str, $start + 1, $word_len);
            } else {
                return '';
            }
        }
    } else {
        return false;
    }
}

mtmosier · Jan 22, 2005

Ooh! Ooh! Can I play too?

function getWord($text, $pos) {
    if ($pos > strlen($text)) {
        return '';
    }
    if ($pos < 1)  $pos = 1;

/*  Get a copy of the string up until $pos  */
$tmp = substr($text, 0, $pos);
/*  Match the last word of the temporary string keeping any trailing non-word characters  */
$tmp = preg_replace('/^.*?([\w\\'\-]+[^\w\\'\-]*)$/s', '\1', $tmp);
/*  Append the second part of the original string back onto our temporary string  */
$tmp .= substr($text, $pos);
/*  Match the word at the beginning of the string and return it  */
return preg_replace('/^([\w\\'\-]*).*$/s', '\1', $tmp);
}

This one counts the first character in the string as position 1. It uses regex's "word" character to find word breaks, which should vary by locale.

Edited to include apostrophes and hyphens in words. Also changed PHP tags to CODE since its nearly impossible to get backslashes right in PHP blocks. grrr

ripat · Jan 23, 2005

Just my two (euro) cent:

function getWord2($text, $pos) {
  if ($pos > strlen($text)) { 
    return 'Tekst is veel te kort!'; 
  } 
  preg_match_all('#\b\w+#', substr($text, 0, $pos), $out);
  return array_pop($out[0]);
}

Using the assertion \b (which is no character consuming) does the magic trick.

Edit This BBcode doesn't like the regex pattern. Hence the CODE tag !

laserlight · Jan 23, 2005

This BBcode doesn't like the regex pattern. Hence the CODE tag !

Yeah, I think its a bug with the php bbcode tag.

There's a bug with your solution though, in that if the position is in the middle of the word, not the whole word is returned (due to the substr())

Weedpacket · Jan 24, 2005

Originally posted by mtmosier
This one counts the first character in the string as position 1. It uses regex's "word" character to find word breaks, which should vary by locale.

It also breaks on "I'd", a word I deliberately used in my test string because of this. It's also why I didn't do anything about punctuation, since nothing was specified about them in the original problem (even though I asked).

mtmosier · Jan 24, 2005

It also breaks on "I'd", a word I deliberately used in my test string because of this.

Quite true. Easily fixed, but then the question becomes what else is a valid part of a word? A hyphen I suppose, but there must be more. Alternatively could simply define what constitues a character on which to break.

I think I need more detailed specs.

ripat · Jan 24, 2005

There's a bug with your solution though, in that if the position is in the middle of the word, not the whole word is returned (due to the substr())

True. If that's what he needs, it is easy to correct it by adding just one line:

function getWord2($text, $pos) {
  if ($pos > strlen($text)) { 
    return 'Tekst is veel te kort!'; 
  }
  while ($text{$pos} != ' ') $pos++;        //  <--------   added line
  preg_match_all('#\b\w+#', substr($text, 0, $pos), $out);
  return array_pop($out[0]);
}

As for the "I'd" problem, pcre syntax considers it as two words. Which is gramaticly correct I guess (forgive me if I'am wrong, english is not my mother language!).

If not, change the regex pattern with #\b[\w']+#.

Et voilà. Meer moet dat niet zijn!

Howto get the string if the position is known

Sstarbbs

laserlight

Sstarbbs

Weedpacket

laserlight

Sstarbbs

laserlight

Weedpacket

Sstarbbs

Sstarbbs

BBlackenedSky

laserlight

Sstarbbs

laserlight

Mmtmosier

Rripat

laserlight

Weedpacket

Mmtmosier

Rripat