Parsing Text Files

jeffgman · Sep 6, 2005

I need to parse some saved e-mail messages. Here is the code I have so far:

#!/usr/bin/php
$dir = opendir(/home/jeff/survey/);
while (($file = readdir($dir))) {
        if(is_file($file)) {
                if(!($fileArray = file($file))) {
                        printf("could not open $file file);
                }
                for ($i=0; $i < count ($fileArray); $i++) {
                        **** NEW CODE ****
                }
        }
closedir($dir);

The lines of the text file look like this:

Merchandise in stock? Average

The left side information is not always the same, but the answer is always in position 56 of that line. So, I need the script to read in the left side of the information to see what line it is on, and then read the answer into a variable. I am going to write out the answers to a MySQL database. But, I have no problem writing the data out. I am just not sure how to parse the lines to see where I am in the file, and to get the answer into a variable.

Any help would be appreciated. Even if it is just pointing me in the right direction.

Thanks.

devinemke · Sep 6, 2005

the [man]substr[/man] fucntion allows you to specify a specific character offset to extract parts of a string.

cyberlew15 · Sep 6, 2005

What you need to do is to store the file in csv format or with a seperator like |
so each line is a new question

Question|Answer|Wrong1|Wrong2 etc (you could add subject, id for random questions you get it).

This way you can explode the file line by line and then explode each question to get it's data and input it into mysql

btw with a csv file mysql can import through phpmyadmin

Is this like a testing / Revision app or a Help / Troubleshooting Knowledgebase

jeffgman · Sep 6, 2005

devinemke wrote:
the [man]substr[/man] fucntion allows you to specify a specific character offset to extract parts of a string.

Thanks for the answer. I think this might do exactly what I want. But, I need some help with how to parse out the beginning of the line so I know which line I am in. I am going to go through each line of the e-mail message, and based on the first part of the line, I will then use the substr function to get the answer and move it into my variable.

Jeff

jeffgman · Sep 6, 2005

cyberlew15 wrote:
What you need to do is to store the file in csv format or with a seperator like |
so each line is a new question

Question|Answer|Wrong1|Wrong2 etc (you could add subject, id for random questions you get it).

This way you can explode the file line by line and then explode each question to get it's data and input it into mysql

btw with a csv file mysql can import through phpmyadmin

Is this like a testing / Revision app or a Help / Troubleshooting Knowledgebase

Thanks for the help. I am not sure I can do that though. I created a survey for our customers to answer. And, mistakenly did not write the answers directly to a MySQL database, but instead just sent the results via e-mail. We got such an incredible response to the survey, I need to know move the data over to a database so I can analyze it. So, I figured I would save each e-mail as a text file, and then parse it, and move the answer over to the database. The other poster mentioned substr, which looks like it might do what I need. I just need help to figure out what line I am in when I am going through each line. Once I know the line, I can use substr and move the answer into a variable, and then right all of the variables into a database.

I was wondering, if I was to save all of the e-mail messages to an mbox file, how would I go about parsing that? It would definitely be easier to just move all of the e-mails to another box on my IMAP server instead of saving each one to a separate text file.

Thanks for the help.

Jeff

cyberlew15 · Sep 6, 2005

this is why I suggested ID's for the formatting this way because it is an array line 1 will be $array[0]; and all associated questions and answers will be a subset of this. are you generating the e-mails or is it someone else and you are harvesting the content either way it is a very interesting project

cyberlew15 · Sep 6, 2005

Well I'm not sure about parsing MBox files as I'm not sure if MBox is the app or just Mailbox as in any mailbox. use the code blow to explode each e-mail into it's individual lines then you will know which line you are on if $line[5]; has the question on it then it will actually be line 6 of the e-mail

$lines_of_mail  = array_map('trim', file('./path_to_inbox/currentmail.mailext'));

jeffgman · Sep 6, 2005

cyberlew15 wrote:
this is why I suggested ID's for the formatting this way because it is an array line 1 will be $array[0]; and all associated questions and answers will be a subset of this. are you generating the e-mails or is it someone else and you are harvesting the content either way it is a very interesting project

I am sorry, I don't really understand what you suggested. I am generating the e-mails from our website, and retaining all of the e-mails. In the future, I will make sure to just write out the data directly to a MySQL database before I send out the e-mail. Then I won't have this problem anymore. But, right now I have about 2,300 e-mails that I need to shove into a database so I can analyze the data for the owner.

Jeff

jeffgman · Sep 6, 2005

cyberlew15 wrote:
Well I'm not sure about parsing MBox files as I'm not sure if MBox is the app or just Mailbox as in any mailbox. use the code blow to explode each e-mail into it's individual lines then you will know which line you are on if $line[5]; has the question on it then it will actually be line 6 of the e-mail
$lines_of_mail  = array_map('trim', file('./path_to_inbox/currentmail.mailext'));

An mbox format mailbox means each e-mail is actually stored in one large text file. Each e-mail starts with a FROM header. That is how the imap or pop server knows where each new e-mail starts in this huge text file. I guess I can do that same thing to try and figure out where each new e-mail starts. I just need to figure out how to parse files.

Jeff

konsu · Sep 6, 2005

please post a sample file that you need to parse.

jeffgman · Sep 7, 2005

konsu wrote:
please post a sample file that you need to parse.

Here you go:

From: nobody [nobody@domain.com]
Sent: Tuesday, August 23, 2005 2:48 PM
To: e-mail address
Subject: Website Feedback Survey

**Merchandise Selection**

Merchandise in stock?                                  Average
Advertised merchandise in stock?                       Good
Merchandise assortment selection                       Good
Merchandise comment:

**Store**

Store clean and organized?                             Good
Store hours convenient?                                Good
Store locations convenient?                            Average

**Sales Associates/Cashiers**

Sales people helpful, friendly, and knowledgeable?     Good
Cashiers helpful, friendly, and quick?                 Good

Additional Comments:

E-Mail: email@domain.com

Store most frequently shopped in?  Pasadena

konsu · Sep 7, 2005

which lines out of these do you want to parse? the ones that have "good", "average" and, I assume, "excellent" at the end? are all other lines ignored?

jeffgman · Sep 7, 2005

konsu wrote:
which lines out of these do you want to parse? the ones that have "good", "average" and, I assume, "excellent" at the end? are all other lines ignored?

I am sorry, I should have stated. Yes, I want each line which has the good, average, excellent. Plus, the two on the bottom. The e-mail address and which store they shop in.

Thanks for any help you can offer me.

Jeff

konsu · Sep 7, 2005

try using regular expressions on each line like:

^(.+)[ \t]+(good|average|excellent)[ \t]+$

does this make sense?

jeffgman · Sep 7, 2005

konsu wrote:
try using regular expressions on each line like:

^(.+)[ \t]+(good|average|excellent)[ \t]+$

does this make sense?

Sort of. But, how do I know which line I am on? For each line, I want the answer to go into a different variable and written to a different field in the database.

I was thinking of doing something like this:

 #!/usr/bin/php
$dir = opendir(/home/jeff/survey/);
while (($file = readdir($dir))) {
        if(is_file($file)) {
                if(!($fileArray = file($file))) {
                        printf("could not open $file file);
                }
                for ($i=0; $i < count ($fileArray); $i++) {
                      if ($i == "Merchandise in stock?") {
                          $stock1 = substr($i,56,9);
                          $stock1 = trim($stock1);
                      }

                  if ($i == "Advertised merchandise in stock?") {
                      $stock2 = substr($i,56,9);
                      $stock2 = trim($stock2);
                  }
// and so on and so on
                }
        }
closedir($dir);

I know my code is not correct, but that is the idea I have. Would that work?

konsu · Sep 7, 2005

the regular expression matches a string of characters (question) followed by empty space followed by a single word. once the regular expression matches, you can extract the question string from the first group and check which one it is. depending on that you can create a database query that saves the data.

jeffgman · Sep 7, 2005

konsu wrote:
the regular expression matches a string of characters (question) followed by empty space followed by a single word. once the regular expression matches, you can extract the question string from the first group and check which one it is. depending on that you can create a database query that saves the data.

Okay, that makes sense. But, I have no idea how to write that code. Could you give me some pointers?

Thanks again for all of your help. I have learned quite a bit during this discussion.

Jeff

jeffgman · Sep 7, 2005

konsu wrote:
the regular expression matches a string of characters (question) followed by empty space followed by a single word. once the regular expression matches, you can extract the question string from the first group and check which one it is. depending on that you can create a database query that saves the data.

Would it be something like this:

 #!/usr/bin/php
$dir = opendir(/home/jeff/survey/);
while (($file = readdir($dir))) {
        if(is_file($file)) {
                if(!($fileArray = file($file))) {
                        printf("could not open $file file);
                }
                for ($i=0; $i < count ($fileArray); $i++) {
                      if ($i =~ ^(.+)[ \t]+(excellent|good|average|fair|poor)[ \t]+$) {
                          $stock1 = substr($i,56,9);
                          $stock1 = trim($stock1);
                      }

                  if ($i =~ ^(.+)[ \t]+(excellent|good|average|fair|poor)[ \t]+$) {
                      $stock2 = substr($i,56,9);
                      $stock2 = trim($stock2);
                  }
// and so on and so on
                }
        }
closedir($dir);

jeffgman · Sep 7, 2005

Never mind. I just re-read the code I pasted, and it will not work. Once I get to work this morning, I will see if I can figure this out.

Jeff

Parsing Text Files

Jjeffgman

Ddevinemke

Ccyberlew15

Jjeffgman

Jjeffgman

Ccyberlew15

Ccyberlew15

Jjeffgman

Jjeffgman

Kkonsu

Jjeffgman

Kkonsu

Jjeffgman

Kkonsu

Jjeffgman

Kkonsu

Jjeffgman

Jjeffgman

Jjeffgman