Extracting info from text file

mpar612 · Mar 6, 2008

Hi everyone,

I searched the forums on this and cannot seem to get this figured out. Although I'm sure it's out there somewhere, because this doesn't seem too complex. I am being sent a text file on a nightly basis. I would like to import this text file into a MySQL database.

The first line in the text file is a list of the database fields I would like to use. However, there are several instances of duplicated field names. For example, below "EmailAddress" appears twice. How do I rename these to make each of these unique?

I am sort of new to working with PHP and text files. Any guidance would be greatly appreciated. Even suggestions on the proper terminology that I should be searching for would be great.

Thanks in advance!

jkurrle · Mar 6, 2008

I would use fopen() to open the file handle, then fgets() to get the first line. I would use explode() to parse out the line into an array of variables, then fix the duplicate variable into a different name. Then, I'd write the first part of the insert statement, based on the variables in the array.

After that, I'd use a "while(!feof)" to loop through the rest of the lines, exploding the data out and appending it into the insert statement. Close the file handle with fclose() and insert the data into your database.

Does this help?

laserlight · Mar 6, 2008

After that, I'd use a "while(!feof)" to loop through the rest of the lines, exploding the data out and appending it into the insert statement.

Do not use feof() to control your loops like that (despite there being such an example in the PHP manual). Use the function that reads from the file to control the loop.

jkurrle · Mar 6, 2008

Laser, can you explain why not use "while (!fof)" and point me toward a code example of what would be the more proper way to do it?

laserlight · Mar 6, 2008

Laser, can you explain why not use "while (!fof)" and point me toward a code example of what would be the more proper way to do it?

Consider this attempt to count the number of characters in a file:

<?php
$filename = "test.txt";
$count = 0;
if ($fp = fopen($filename)) {
    while (!feof($fp)) {
        fgetc($fp);
        ++count;
    }
    fclose($fp);
}
echo $count;
?>

If the file contains n characters, $count will be n+1. The reason is that feof() only returns true when EOF is set (or there is an error). But EOF is only set when there is an attempt to read from the file, and it is discovered that the end of the file has been reached. So on the nth iteration, feof() returns false, and fgetc() happily takes the last char. But then instead of ending the loop there and then, on the next iteration, feof() still returns false, and finally fgetc() finds it cannot take another char, and thus sets EOF. However, ++count is still executed, so the final count is off by one.

The solution is to use fgetc() itself to test for EOF:

<?php
$filename = "test.txt";
$count = 0;
if ($fp = fopen($filename)) {
    while (fgetc($fp) !== false) {
        ++count;
    }
    fclose($fp);
}
echo $count;
?>

Of course, this is a rather naive example since filesize() might be better.

jkurrle · Mar 6, 2008

But how are you to determine what is the last line, if you're using fgets? If you get a line that doesn't exist, does it not throw out an errorlevel? example:

while (!feof($fp))
  {
  if($line=fgets($fp,4096))
    {
    // do something
    }
  }

Would something like this not work?

laserlight · Mar 6, 2008

But how are you to determine what is the last line, if you're using fgets? If you get a line that doesn't exist, does it not throw out an errorlevel?

No, it will return false on an error or EOF.

That said, with fgets() a loop controlled with feof() might work: EOF may be set without needing a failed read. (But it is not guaranteed.)

EDIT:
That reminds me. [man]file/man may be the most appropriate function here as it returns an array of lines, and I am guessing that that's how the input file is structured.

mpar612 · Mar 20, 2008

Hi everyone,

Thanks for your help. I'm sorry for the delayed response. The logic is still a little fuzzy to me here. Below I open the file using fopeon(), get the first line from the text file using fgets() and explode the first line into an array.

Now to replace duplicate names I think I need to run some sort of if or while statement. Can anyone provide any assistance with taking this one step further? I have never used explode() before.

I could add a counter and append a number at the end of each string to make it unique. Would that be the best thing to do?

Thanks!

<?php
	$handle = fopen("filename.txt", "r");
	$firstLineOfFile = fgets($handle);
	explode('|', $firstLineOfFile);

?>

jkurrle · Mar 20, 2008

First of all, you can't use explode like that, you need to set a array variable to the exploded results. Something like this:

$array1=explode('|',$line);

Using Laser's file() reference, we could do something like this:

//First, create the beginning of the insert statement:
$insert="insert into `sometable` 
  (Transaction_Type,
  MLSNumber,
  ListOfficeId,
  Address1,
  Address2,
  City,
  Company,
  EmailAddress,
  PhoneMain,
  State,
  WebPage,
  Zip5,
  ListAgentId,
  Agent_Email) values ";

$lines = file('http://www.example.com/');  // Get a file into an array.  

$count=count($lines); // count how many lines in the file

for ($i=1,$i<=($count-1);$i++)  // for each line in the file, except the last one...
  {
  $line=array_pop($lines);  //take a line from the array of lines
  $array_of_values=explode('|',$line);  //explode it into values
  $insert .= "(";  // append opening parenthesis
  $count2=count(array_of_values); // get the number of values to insert
  for ($j=0;$j<=($count2-2);$j++) // Process each value except the last
    {
    $insert .= "\"$array_of_values[$j]\",";  //append each value on the end
    }
  $insert .= "\"".$array_of_values[($count2-1)]."\"),"; // add last value on the end
  }
$line=array_pop($lines);  // getting the last line
$array_of_values=explode('|',$line);  // last batch of values
$count2=count($array_of_values);
$insert .= "(";
for ($j=0;$j,<=($count2-2);$j++)
  {
  $insert .= "\"".$array_of_values[$j]."\",";
  }
$insert .= "\"".$array_of_values[($count2-1)]."\")";

echo "Your insert statement is $insert";

mpar612 · Mar 20, 2008

Thanks! What you provided will significantly help me with my next task. However, I am looking to create the db table on the fly. The issue I am facing is that the text file has a list of headings and some of them are duplicates. I am looking for a way to make the duplicates unique.

So I want to open the file, find the first line of the file and rename the duplicate values that are separated by the pipe character.

Thanks!

Extracting info from text file

Mmpar612

Jjkurrle

laserlight

Jjkurrle

laserlight

Jjkurrle

laserlight

Mmpar612

Jjkurrle

Mmpar612