shivom wrote:I need to count the number of words appearing in website...For example, in google.com i need to find out how many words are repeating and how many times..and also we have to take care of special words like John and John's .
Thanks
I did a search with google: php word counter
<?php
// http://www.phpit.net/code/word-count/
function wordcount($str) {
return count(explode(' ',$str));
}
// Example Usage:
$test = "This has four words.";
echo "There are " . wordcount($test) . " in that sentence";
/*Jian Wu Says:
Make sure you also use array_filter function to filter out false elements when your string has double, triple or multiple spaces between words. So the function should become this:*/
function wordcount($str) {
return count(array_filter(explode(' ', $str)));
}
?>
1. Read the website ( or maybe webpage ) into a string
2. Run the function on the string,
will make an array of words dividing string at every space
3. Count array. There is your result!
There is some function that may compare each word in such array.
You have to loop and compare each word against another.
strncasecmp( $str1, $str2, $n )
Binary safe case-insensitive string comparison of the first n characters
<?php
$str1 = "john";
$str2 = "JOHN'S";
// Compare 2 words, by the length of shortest of the 2 words
// To find the minimum length use: min( $a, $b ),
// where $a and $b is str-length of words
// strncasecmp() = case-insensitive string comparison of the first n chars
if( strncasecmp( $str1, $str2, min(strlen($str1),strlen($str2))) ==0 ){
echo 'they are similar!';
}
else{
echo 'not very much same words';
}
?>
A little problem:
for will be similar to forbidden
and go as government
But this could be avoided, by only compare 2 words if length difference is <=2 characters.
If length diff is greater, then they ARE 2 different words.
🙂