I was trying to understand Weedpacket's thinking, and couldn't really understand how fulltext would have any language-specific aspects. I do recall there's a NATURAL LANGUAGE MODE syntax for MySQL, but that hasn't yet clarified much for me. I am still trying to absorb it but haven't had much time.
I stumbled across myisam_ftdump, a command which allows you to get a look inside a MyISAM fulltext index. I did a quick experiment, creating this SQL table:
CREATE TABLE `foo` (
`id` int NOT NULL,
`col1` varchar(255) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO `foo` (`id`, `col1`) VALUES
(1, ' Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.'),
(2, 'アメリカ合衆国の著作権法 (英語: Copyright law of the United States) は、文芸・映像・音楽・美術・ソフトウェアなどの著作物と、その著作者などの権利を保護するアメリカ合衆国の法律である。米国民の創作した著作物だけでなく、米国内に流通す');
ALTER TABLE `foo`
ADD PRIMARY KEY (`id`);
ALTER TABLE `foo` ADD FULLTEXT KEY `col1` (`col1`);
ALTER TABLE `foo`
MODIFY `id` int NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=3;
The output is fairly informative. The index appears to work by splitting up a data field along the whitespace and punctuation characters, indexing the non-whitespace characters:
# myisam_ftdump -d foo 1
0 0.8787346 brought
0 0.8787346 conceived
0 0.8787346 continent
bc 0.9157509 copyright
0 0.8787346 created
0 0.8787346 dedicated
0 0.8787346 equal
0 0.8787346 fathers
0 0.8787346 liberty
0 0.8787346 nation
0 0.8787346 proposition
0 0.8787346 score
bc 0.9157509 states
bc 0.9157509 united
0 0.8787346 years
bc 0.9157509 その著作者などの権利を保護するアメリカ合衆国の法律である
bc 0.9157509 アメリカ合衆国の著作権法
bc 0.9157509 ソフトウェアなどの著作物と
bc 0.9157509 米国内に流通す
bc 0.9157509 米国民の創作した著作物だけでなく
I don't speak Japanese -- and don't know if they use spaces like we do or whether those clusters are meaningful like words are or if the fulltext search would be useless for your typical japanese search -- but I thought this might provide meaningful detail here.
dalecosp Is this for a specific application?
I'm migrating a site we built 16 years ago to new tech. A lot of the coding and database decisions were less than masterful and I'm wondering whether we need all those indexes on certain tables or if they are useless. There are about 100 tables so it's a fairly involved process.