I have a piece of code that goes through a list of words and places the word in a specific file, depending on the first letter of the word. For example, if the first letter were "A," then the word would be placed within the A.txt file.

The problem is that output in the files is just a bunch of boxes, like the character encoding is somehow screwed up (which I don't understand how that could be possible). It's very strange. I can output the EXACT same info in an ECHO statement to the browser, and it appears just fine. I get no real errors.

Here's the PHP code (every bit of it):

$letters = array("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z");
foreach ($letters as $thisletter) {
	$words = "";
	foreach (explode("\r\n",file_get_contents("1.txt")) as $thisline) {
		foreach (explode(" ",$thisline) as $thisword) {
			if (trim($thisword) != "") {
				if (substr($thisword,0,1) == $thisletter) {
					$words = $words . "-" . $thisword;
				}
			}
		}
	}
	file_put_contents($thisletter.".txt",$words);
}

Here's a small input sample file:

abaca abaci aback abaft abase abash abate abbas abbes abbey abbot abeam
abele abets abhor abide abler ables abode aboil abort about above abuse
boded bodes boffs bogan bogey boggy bogie bogle bogus bohea boils boing
bolas bolds boles bolls bolos bolts bolus bombe bombs bonds boned boner

    Hmmm.... I'd've written more like this (mainly to save reading in the entire dictionary file for every letter):

    $letters = "abcdefghijklmnopqrstuvwxyz";
    $files=array();
    for($i=0; $i<26; ++$i)
    	$files[$letters[$i]]=array();
    
    $words = preg_split('/\s+/', strtolower(file_get_contents('1.txt')), PREG_SPLIT_NO_EMPTY);
    foreach($words as $word)
    	$files[$word[0]][]=$word;
    
    foreach($files as $key=>$words)
    {
    	file_put_contents($key.'.txt', join('-',$words));
    }
    

    Not that that will fix your problem, but I don't see anything in your code to cause it. Maybe there's some sort of weird locale issue I'm not familiar with, or maybe whatever it is you're viewing the file with can't cope with really really long lines (since each file would be a single line with all those words on it).

      Yeah, it's not written well. It's just for my local machine right now. Eventually, I'm going to put it on the remote server long enough just to transfer some data from files to a database (it won't be there long enough for anybody to find it).

      But yeah, it's strange. At first, I thought it was because it was so long, but even when I try it using the small sample, it outputs a bunch of boxes. And like I said, it only does that when I open the file. If I output that same exact data to the browser, it appears correctly. And in fact, I can output anything NOT from the sample file (that is, strings hard-coded into the PHP script itself), and they output fine to either the browser or the file.

      I just can't understand it. I've never had this happen before.

      By the way, I'm using Notepad (Windows XP Home) to view the outputted files.

      The sample file was copied from a Web site, not typed by me. Is there any way possible that the encoding of the Web site was different and was somehow copied over when I copied and pasted the data from the Web site into the sample file? That's the best idea I can come up with.

        my_name wrote:

        The sample file was copied from a Web site, not typed by me.

        Quite likely, if you haven't considered that different operating systems have different conventions about line endings (Notepad is the only text editor I know of that still doesn't realise this itself); have you looked at the original file in Notepad? The code I wrote doesn't assume that lines end with \r\n. I don't have any hard stats, but I suspect the majority of text files in the world use \n for their line endings.
        (There's also the possibility that the text file is in Unicode, not ASCII, but Notepad should be able to display that without trouble. Without seeing the actually binary data it's necessary to guess.)

          Well, that's certainly a possibility.

          The only thing that gets me about that, though, is that when I view the original input file in Notepad, it displays just fine, with the correct endings and everything. Then, when it's put through PHP and outputted, it's all jumbled.

          When I saved it Notepad, shouldn't that have saved it in ASCII since it was set to save it that way?

            Okay, I just tried it with simply the newline character (\n), and that worked, although there are still a few boxes appearing near the end, which I would think are the \r characters...but I guess that can't be, seeing as how it didn't work correctly at all when I used \r\n.

            Is there a good replacement for Notepad for Windows? Something that's completely freeware and is basically the same but can recognize this crap?

              my_name wrote:

              Is there a good replacement for Notepad for Windows?

              Short answer is: any text editor (there are quite a few cited and compared in this thread, which also takes into account how good they are for PHP coding). The one I'm using at the moment is seven years old and working fine, thankyou. 🙂

                Write a Reply...