I need to parse some saved e-mail messages. Here is the code I have so far:

#!/usr/bin/php
$dir = opendir(/home/jeff/survey/);
while (($file = readdir($dir))) {
        if(is_file($file)) {
                if(!($fileArray = file($file))) {
                        printf("could not open $file file);
                }
                for ($i=0; $i < count ($fileArray); $i++) {
                        **** NEW CODE ****
                }
        }
closedir($dir);

The lines of the text file look like this:

Merchandise in stock? Average

The left side information is not always the same, but the answer is always in position 56 of that line. So, I need the script to read in the left side of the information to see what line it is on, and then read the answer into a variable. I am going to write out the answers to a MySQL database. But, I have no problem writing the data out. I am just not sure how to parse the lines to see where I am in the file, and to get the answer into a variable.

Any help would be appreciated. Even if it is just pointing me in the right direction.

Thanks.

    the [man]substr[/man] fucntion allows you to specify a specific character offset to extract parts of a string.

      What you need to do is to store the file in csv format or with a seperator like |
      so each line is a new question

      Question|Answer|Wrong1|Wrong2 etc (you could add subject, id for random questions you get it).

      This way you can explode the file line by line and then explode each question to get it's data and input it into mysql

      btw with a csv file mysql can import through phpmyadmin

      Is this like a testing / Revision app or a Help / Troubleshooting Knowledgebase

        devinemke wrote:

        the [man]substr[/man] fucntion allows you to specify a specific character offset to extract parts of a string.

        Thanks for the answer. I think this might do exactly what I want. But, I need some help with how to parse out the beginning of the line so I know which line I am in. I am going to go through each line of the e-mail message, and based on the first part of the line, I will then use the substr function to get the answer and move it into my variable.

        Jeff

          cyberlew15 wrote:

          What you need to do is to store the file in csv format or with a seperator like |
          so each line is a new question

          Question|Answer|Wrong1|Wrong2 etc (you could add subject, id for random questions you get it).

          This way you can explode the file line by line and then explode each question to get it's data and input it into mysql

          btw with a csv file mysql can import through phpmyadmin

          Is this like a testing / Revision app or a Help / Troubleshooting Knowledgebase

          Thanks for the help. I am not sure I can do that though. I created a survey for our customers to answer. And, mistakenly did not write the answers directly to a MySQL database, but instead just sent the results via e-mail. We got such an incredible response to the survey, I need to know move the data over to a database so I can analyze it. So, I figured I would save each e-mail as a text file, and then parse it, and move the answer over to the database. The other poster mentioned substr, which looks like it might do what I need. I just need help to figure out what line I am in when I am going through each line. Once I know the line, I can use substr and move the answer into a variable, and then right all of the variables into a database.

          I was wondering, if I was to save all of the e-mail messages to an mbox file, how would I go about parsing that? It would definitely be easier to just move all of the e-mails to another box on my IMAP server instead of saving each one to a separate text file.

          Thanks for the help.

          Jeff

            this is why I suggested ID's for the formatting this way because it is an array line 1 will be $array[0]; and all associated questions and answers will be a subset of this. are you generating the e-mails or is it someone else and you are harvesting the content either way it is a very interesting project

              Well I'm not sure about parsing MBox files as I'm not sure if MBox is the app or just Mailbox as in any mailbox. use the code blow to explode each e-mail into it's individual lines then you will know which line you are on if $line[5]; has the question on it then it will actually be line 6 of the e-mail

              $lines_of_mail  = array_map('trim', file('./path_to_inbox/currentmail.mailext'));
              
                cyberlew15 wrote:

                this is why I suggested ID's for the formatting this way because it is an array line 1 will be $array[0]; and all associated questions and answers will be a subset of this. are you generating the e-mails or is it someone else and you are harvesting the content either way it is a very interesting project

                I am sorry, I don't really understand what you suggested. I am generating the e-mails from our website, and retaining all of the e-mails. In the future, I will make sure to just write out the data directly to a MySQL database before I send out the e-mail. Then I won't have this problem anymore. But, right now I have about 2,300 e-mails that I need to shove into a database so I can analyze the data for the owner.

                Jeff

                  cyberlew15 wrote:

                  Well I'm not sure about parsing MBox files as I'm not sure if MBox is the app or just Mailbox as in any mailbox. use the code blow to explode each e-mail into it's individual lines then you will know which line you are on if $line[5]; has the question on it then it will actually be line 6 of the e-mail

                  $lines_of_mail  = array_map('trim', file('./path_to_inbox/currentmail.mailext'));
                  

                  An mbox format mailbox means each e-mail is actually stored in one large text file. Each e-mail starts with a FROM header. That is how the imap or pop server knows where each new e-mail starts in this huge text file. I guess I can do that same thing to try and figure out where each new e-mail starts. I just need to figure out how to parse files.

                  Jeff

                    please post a sample file that you need to parse.

                      konsu wrote:

                      please post a sample file that you need to parse.

                      Here you go:

                      From: nobody [nobody@domain.com]
                      Sent: Tuesday, August 23, 2005 2:48 PM
                      To: e-mail address
                      Subject: Website Feedback Survey
                      
                      
                      **Merchandise Selection**
                      
                      Merchandise in stock?                                  Average
                      Advertised merchandise in stock?                       Good
                      Merchandise assortment selection                       Good
                      Merchandise comment:
                      
                      
                      **Store**
                      
                      Store clean and organized?                             Good
                      Store hours convenient?                                Good
                      Store locations convenient?                            Average
                      
                      
                      **Sales Associates/Cashiers**
                      
                      Sales people helpful, friendly, and knowledgeable?     Good
                      Cashiers helpful, friendly, and quick?                 Good
                      
                      
                      Additional Comments:
                      
                      
                      
                      E-Mail: email@domain.com
                      
                      Store most frequently shopped in?  Pasadena
                      

                        which lines out of these do you want to parse? the ones that have "good", "average" and, I assume, "excellent" at the end? are all other lines ignored?

                          konsu wrote:

                          which lines out of these do you want to parse? the ones that have "good", "average" and, I assume, "excellent" at the end? are all other lines ignored?

                          I am sorry, I should have stated. Yes, I want each line which has the good, average, excellent. Plus, the two on the bottom. The e-mail address and which store they shop in.

                          Thanks for any help you can offer me.

                          Jeff

                            try using regular expressions on each line like:

                            (.+)[ \t]+(good|average|excellent)[ \t]+$

                            does this make sense?

                              konsu wrote:

                              try using regular expressions on each line like:

                              (.+)[ \t]+(good|average|excellent)[ \t]+$

                              does this make sense?

                              Sort of. But, how do I know which line I am on? For each line, I want the answer to go into a different variable and written to a different field in the database.

                              I was thinking of doing something like this:

                               #!/usr/bin/php
                              $dir = opendir(/home/jeff/survey/);
                              while (($file = readdir($dir))) {
                                      if(is_file($file)) {
                                              if(!($fileArray = file($file))) {
                                                      printf("could not open $file file);
                                              }
                                              for ($i=0; $i < count ($fileArray); $i++) {
                                                    if ($i == "Merchandise in stock?") {
                                                        $stock1 = substr($i,56,9);
                                                        $stock1 = trim($stock1);
                                                    }
                              
                                                if ($i == "Advertised merchandise in stock?") {
                                                    $stock2 = substr($i,56,9);
                                                    $stock2 = trim($stock2);
                                                }
                              // and so on and so on
                                              }
                                      }
                              closedir($dir); 
                              

                              I know my code is not correct, but that is the idea I have. Would that work?

                                the regular expression matches a string of characters (question) followed by empty space followed by a single word. once the regular expression matches, you can extract the question string from the first group and check which one it is. depending on that you can create a database query that saves the data.

                                  konsu wrote:

                                  the regular expression matches a string of characters (question) followed by empty space followed by a single word. once the regular expression matches, you can extract the question string from the first group and check which one it is. depending on that you can create a database query that saves the data.

                                  Okay, that makes sense. But, I have no idea how to write that code. Could you give me some pointers?

                                  Thanks again for all of your help. I have learned quite a bit during this discussion.

                                  Jeff

                                    konsu wrote:

                                    the regular expression matches a string of characters (question) followed by empty space followed by a single word. once the regular expression matches, you can extract the question string from the first group and check which one it is. depending on that you can create a database query that saves the data.

                                    Would it be something like this:

                                     #!/usr/bin/php
                                    $dir = opendir(/home/jeff/survey/);
                                    while (($file = readdir($dir))) {
                                            if(is_file($file)) {
                                                    if(!($fileArray = file($file))) {
                                                            printf("could not open $file file);
                                                    }
                                                    for ($i=0; $i < count ($fileArray); $i++) {
                                                          if ($i =~ ^(.+)[ \t]+(excellent|good|average|fair|poor)[ \t]+$) {
                                                              $stock1 = substr($i,56,9);
                                                              $stock1 = trim($stock1);
                                                          }
                                    
                                                      if ($i =~ ^(.+)[ \t]+(excellent|good|average|fair|poor)[ \t]+$) {
                                                          $stock2 = substr($i,56,9);
                                                          $stock2 = trim($stock2);
                                                      }
                                    // and so on and so on
                                                    }
                                            }
                                    closedir($dir);
                                    

                                      Never mind. I just re-read the code I pasted, and it will not work. Once I get to work this morning, I will see if I can figure this out.

                                      Jeff

                                        Write a Reply...