Originally posted by dalecosp
Could you use fseek to .... randomly grab a section of the file, then explode and pick a random line from the random section?
rather than read the whole section of the file, you could fseek() to a random point in the file, fgets() to get to the next linebreak, and then fgets() the single line after that.
The big drawback here is that some words would be more common than other words - specifically, if "somereallylongword" is followed by "aword", and "short" is followed by "anotherword", then "aword" would be more likely to be picked than "anotherword", becuse the fseek() is three times as likely to land in the middle of "somereallylongword" as in the middle of "short". This could be avoided by padding each line out until they're all the same, but that (a) bulks out the size of the file, and (b) means examining the file to find the longest line.
I thought of keeping and maintaining an auxiliary index file to track file offsets where each line began - but to make the idea worthwhile was a mission; since the index file itself would be large it would need its own index file and so on - eventually the index turned into a btree and the whole idea turned into "implement a database".