Hi guys,

not sure if anyone has done this before but I couldn't find a post about it.

I have created a website for which I want each page to automatically generate its own meta description and keywords.

The description is easy. But for the keywords I want to write a function that reads the whole article and counts for example, all instances of words over 5 characters in length, I then want to use the top 20 words as my keywords for that article?

Is there a function best suited to doing this or has someone already done it?

    few things come to mind, one is you can read each word and count the string length of each word if over 5 charaters add to a array list , then you can recall the top 5 or whatever from the array

    $myDoc = "this is my document example codes here bla bla bla right?"
    
    $step1 = explode(" ", $myDoc);
    $keyWord = Array();
    
    // loop each word array
    foreach ($step1 as $eachWord) {
    	if (strlen($eachWord <= 5)) {
    	$keyWord[] = %eachWord;
    	}
    }
    echo $keyWord[0]; // first keyword that is 5 char long or less
    

    many others can help you with this, many other ways of doign it too, but this is a one i can think of top of my head, not the best but it can work.

      Or another way

      <?php
      $text = file_get_contents('PATH_TO.txt ');
      
      $text = explode(' ',$text);
      
      foreach ($text as $key => $word ) {
      	if (strlen($word) >4) {
      		$words[] = $word;
      	}
      }
      
      $count = array_count_values($words);
      
      array_multisort($count,SORT_DESC);
      
      $i = 1;
      foreach ($count as $key => $value) {
      	if( $i <= 20 ) {
      
      	echo 'Number of occurances:<b> '.$value.' </b>of word: <b>'.$key."</b><br>\n";
      	$i++;
      } else {
      	exit();
      }
      }
      
      ?>

        Rincewind that worked a treat thank you very much. By the way, are you the rincewind I know from EG?

        Anyway following on from that problem I have another one now which is puzzling me:

        In the loop you created above, I want to assign each word in the array to a variable, for example keyword1 keyword2 keyword3 up to 20. But I want this to happen in the loop so once finished I can echo 20 keywords one after the other and assign them to the metakeywords for this article.

        So how can I do this?

        I have tried

        I can generate the 20 keyword names using "keyword$i" in the loop but how do I then assign $key to each of these? 😐

          <?php
          $text = file_get_contents('kinsmen.txt ');
          
          $text = explode(' ',$text);
          
          foreach ($text as $key => $word ) {
          	if (strlen($word) >4) {
          		$words[] = $word;
          	}
          }
          
          $count = array_count_values($words);
          
          array_multisort($count,SORT_DESC);
          
          $i = 1;
          foreach ($count as $key => $value) {
          
          if( $i <= 20 ) {
          
          	$keyword[] = $key;	# will give an array of the top 20 words
          
          	echo 'Number of occurances:<b> '.$value.' </b>of word: <b>'.$key."</b><br>\n";
          	$i++;
          } else {
          	print_r($keyword);
          	exit();
          }
          }
          
          ?>

          This will return an array with the top twenty keywords called $keyword

            3 years later

            I have taken the last code and I am trying to use it, in fact it works fine apart from it missing one of the words in the text.

            if you look at the text you will see elit appears twice but when you run the script it does not appear in the result.

            Any ideas?

            PS, the data in my page will be appended to other text before processing hence the 2 lots of $text.

            <?php
            echo "<strong>Original text</strong><br />";
            $text = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.";
            $text .= "<br />Ut enim ad minim consectetur adipisicing elit, quis nostrud exercitation ullamco laboris elit ut aliquip ex ea commodo consequat.";
            echo "<pre>".$text."</pre><br>";
            $Line_Breaks_Find = array("<br>", "<br />", "<br/>", "\r", "\n", ".", ",", "  ");
            $Line_Breaks_Replace   = " ";
            $text = str_ireplace($Line_Breaks_Find, $Line_Breaks_Replace, $text);
            echo "<strong>Modified text (Strip out unwanted tag, punctuation etc)</strong><br />";
            echo "<pre>".$text."</pre><br>";
            
            $text = explode(' ',$text);
            foreach ($text as $key => $word ) {
                if (strlen($word) >4) {
                    $words[] = $word;
                }
            }
            $count = array_count_values($words);
            array_multisort($count,SORT_DESC);
            $i = 1;
            $VarsString ="Keywords";
            foreach ($count as $key => $value) {
                if( $i <= 20 ) {
                    $keyword[] = $key;    # will give an array of the top 20 words
                    echo 'Number of occurances:<b> '.$value.' </b>of word: <b>'.$key."</b><br>\n";
            		$VarsString .= ", ".$key;
                    $i++;
                }
            }
            echo "<br>".$VarsString;
            ?> 
            
              Pigmaster;10917373 wrote:

              if you look at the text you will see elit appears twice but when you run the script it does not appear in the result.

              Worked it out why, it was due to

              <?php
                  if (strlen($word) >4) {
                      $words[] = $word;
                  }
              ?> 
              

              :glare:

                I thought I'd have a go, too.

                function sort_words($a,$b)
                {
                	// Sorting in descending order by frequency (0th element);
                	// tiebreaks decided lexicographically (1st element)
                	if($t = $b[0]-$a[0]) return $t;
                	return strcmp($b[1],$a[1]);
                }
                
                function longer_than_4_letters($word)
                {
                	return strlen($word[1]) > 4;
                }
                
                
                $words = array_count_values(preg_split('/[^a-z]+/', strtolower($article_text), -1, PREG_SPLIT_NO_EMPTY));
                $words = array_filter(array_map(null, array_values($words), array_keys($words)), 'longer_than_4_letters');
                usort($words,'sort_words');
                $top_20_words = array_slice($words, 0, 20);
                print_r($top_20_words);
                
                  Write a Reply...