You see the , at the end ? i thoight it tests this to filter this out ?

It doesnt, because you define the delimiter as a space.

It does for my own version, because I defined words as alphanumeric strings, rather than as non-delimiters.

    SO you wrote a simlira unction for your self ? cn you post this ?

      function parseForWord($str, $pos) {
          $len = strlen($str);
          //perform bounds checking
          if ($pos >= 0 && $pos < $len) {
              //is character at $pos is a delimiter?
              if (ctype_alnum($str{$pos})) {
                  //$pos marks a character within a word
                  $word_len = 1;
                  //scan left for start of word
                  for ($start = $pos - 1; $start >= 0; $start--) {
                      if (ctype_alnum($str{$start})) {
                          $word_len++;
                      } else {
                          break;
                      }
                  }
                  //scan right to end of word
                  for ($i = $pos + 1; $i < $len; $i++) {
                      if (ctype_alnum($str{$i})) {
                          $word_len++;
                      } else {
                          break;
                      }
                  }
                  //$start + 1 here because of $start-- or an out-of-bounds $start
                  return substr($str, $start + 1, $word_len);
              } else {
                  //$pos marks position of a delimiter
                  $word_len = 0;
                  //scan left only
                  for ($i = $pos - 1; $i >= 0; $i--) {
                      if (ctype_alnum($str{$i})) {
                          $word_len = 1;
                          break;
                      }
                  }
                  //$word_len = 1 > 0 if (last character of) word found
                  if ($word_len > 0) {
                      //scan left for start of word
                      for ($start = $i - 1; $start >= 0; $start--) {
                          if (ctype_alnum($str{$start})) {
                              $word_len++;
                          } else {
                              break;
                          }
                      }
                      return substr($str, $start + 1, $word_len);
                  } else {
                      return '';
                  }
              }
          } else {
              return false;
          }
      }

        Ooh! Ooh! Can I play too?

        function getWord($text, $pos) {
            if ($pos > strlen($text)) {
                return '';
            }
            if ($pos < 1)  $pos = 1;
        
        /*  Get a copy of the string up until $pos  */
        $tmp = substr($text, 0, $pos);
        /*  Match the last word of the temporary string keeping any trailing non-word characters  */
        $tmp = preg_replace('/^.*?([\w\\'\-]+[^\w\\'\-]*)$/s', '\1', $tmp);
        /*  Append the second part of the original string back onto our temporary string  */
        $tmp .= substr($text, $pos);
        /*  Match the word at the beginning of the string and return it  */
        return preg_replace('/^([\w\\'\-]*).*$/s', '\1', $tmp);
        }
        

        This one counts the first character in the string as position 1. It uses regex's "word" character to find word breaks, which should vary by locale.

        Edited to include apostrophes and hyphens in words. Also changed PHP tags to CODE since its nearly impossible to get backslashes right in PHP blocks. grrr

          Just my two (euro) cent:

          function getWord2($text, $pos) {
            if ($pos > strlen($text)) { 
              return 'Tekst is veel te kort!'; 
            } 
            preg_match_all('#\b\w+#', substr($text, 0, $pos), $out);
            return array_pop($out[0]);
          } 

          Using the assertion \b (which is no character consuming) does the magic trick.

          Edit This BBcode doesn't like the regex pattern. Hence the CODE tag !

            This BBcode doesn't like the regex pattern. Hence the CODE tag !

            Yeah, I think its a bug with the php bbcode tag.

            There's a bug with your solution though, in that if the position is in the middle of the word, not the whole word is returned (due to the substr())

              Originally posted by mtmosier
              This one counts the first character in the string as position 1. It uses regex's "word" character to find word breaks, which should vary by locale.

              It also breaks on "I'd", a word I deliberately used in my test string because of this. It's also why I didn't do anything about punctuation, since nothing was specified about them in the original problem (even though I asked🙂).

                It also breaks on "I'd", a word I deliberately used in my test string because of this.

                Quite true. Easily fixed, but then the question becomes what else is a valid part of a word? A hyphen I suppose, but there must be more. Alternatively could simply define what constitues a character on which to break.

                I think I need more detailed specs.

                  There's a bug with your solution though, in that if the position is in the middle of the word, not the whole word is returned (due to the substr())

                  True. If that's what he needs, it is easy to correct it by adding just one line:

                  function getWord2($text, $pos) {
                    if ($pos > strlen($text)) { 
                      return 'Tekst is veel te kort!'; 
                    }
                    while ($text{$pos} != ' ') $pos++;        //  <--------   added line
                    preg_match_all('#\b\w+#', substr($text, 0, $pos), $out);
                    return array_pop($out[0]);
                  } 

                  As for the "I'd" problem, pcre syntax considers it as two words. Which is gramaticly correct I guess (forgive me if I'am wrong, english is not my mother language!).

                  If not, change the regex pattern with #\b[\w']+#.

                  Et voilĂ . Meer moet dat niet zijn!

                    Originally posted by ripat
                    If not, change the regex pattern with #\b[\w']+#.

                    Which would then include the closing (but not opening) quotes in 'single quoted strings'.

                    I could change mine to - after matching on space/whitespace - then trim off non-wordlike characters from either end. I'd use trim if it had negated character classes, but it doesn't. So preg_replace('/\W|\W$', '', $word) it is then.

                    But there will still be failures on some strings' apostrophes.

                    But maybe Dutch has different conventions.

                      Yes, there will always be 'special cases'.

                      Just for the sake of efficiency I tried to optimize my code above to make it run even faster.

                      function getWord4($text, $pos) {
                        $pos = strpos( $text, ' ', $pos);
                        if ($pos === FALSE) $pos = strlen($text);
                        $out = preg_split("#[^\w-']#", substr($text, 0, $pos), -1, PREG_SPLIT_NO_EMPTY);
                        return array_pop($out);
                      }

                      preg_split does the job faster and also, strpos() is more efficient than my while() loop - (which, btw, had a bug when $pos was in the last word! - the while loop did not stop)

                      We all had fun at this post but what does the original poster thinks of all this?

                        Havent tested your new version yet, but wouldnt end($out) be better than array_pop($out) in this case?

                          Originally posted by laserlight
                          Havent tested your new version yet, but wouldnt end($out) be better than array_pop($out) in this case?

                          It is indeed beter as there is no need to shorten the $out array.

                          I guess we all have our (bad) habits.

                          Thanks.

                            I just read a post above that asked if strpos could work backwards. Well I don't think so but by using strrev() one could emulate it.

                            function strpos_backwards($text, $pos) {
                              // finds the position of the next space starting at the given $pos
                              $pos = strpos( $text, ' ', $pos);
                              // if $pos in the middle of last word, this will position $pos at the end of string
                              if ($pos === FALSE) $pos = strlen($text);
                              // substr chops the string behind new $pos and reverse the chopped string
                              $reversed_text = strrev(substr($text, 0, $pos));
                              // it's here that strpos work backwards i.e. forwards but on a reversed string
                              // substring it until first occurrence of a space
                              $out = substr($reversed_text, 0, strpos($reversed_text, ' '));
                              // reverse back the found word and return it
                              return strrev($out);
                            }
                            

                            I benchmarked it against the above solutions and it's the fastest (2 times faster than the my preg_split thing and Laserlight's string function that are both equally fast)

                              Write a Reply...