im not sure how that gets me the data here:

<strong>DATA<br>

i need the <br> tag to be the ending delimiter there as the ending </strong> would include things i do not want

    That's called "moving target requirements." Let us know what the exact requirements are, then we can provide an exact solution. 😉

      i reference post three and post five:

      #3:
      e.g if i wanted to match:
      <strong>data</strong>other html and line breaks<strong>data<br>

      #5:
      no... i want the parts called data:
      <strong>DATA</strong>...(other html and linebreaks)...<strong>DATA<br>

      😉

      SO

      <strong>VARIABLE 1</strong>...(other html and linebreaks)...<strong>VARIABLE 2<br>

      i need to get variable 1 and variable 2

      nogdog... i like ya man, but if u need more clarification :rolleyes:😃

        Does the DATA end only at a </strong> or <br> tag, or at any <...> tag?

          just in those two unique instances:

          <strong></strong>

          and <strong><br>

          theres nothing else i care about in the page

            See if this works (I've not tested it):

            
            $pattern = '@<strong(?:\s[^>]*)?>(.*?)(?:</strong>|<(?:/\s*)br>)@is';
            

              That worked very well... however, i am guilty some what of the 'moving target requirements'

              really what im trying todo is parse .;this;. page

              ideally my output would be a multi dim array where each months events would be loaded into it the array

              well use the first event as an example... i need:

              -the numbers (1 - 31)
              -the name (Cervical Health Awareness Month)
              -the website (www.nccc-online.org/awareness.php)

              i was trying to regex the page, but im horrible at doing this and going through it one step at a time is sure to drive everyone crazy... so in short thats what im trying todo

                Why reinvent the parser? Why not just use the DOM extension's loadHTML() method?

                  well im not sure if it makes a whole ton of difference, but if u look at the HTML its not exactly compliant and its working in quirks mode

                  ive honestly never used the loadHTML function or any of the DOM functions... that being said, im off to learn something new

                    just in using the based function i get errors:

                    $html = file_get_contents("http://www.healthfinder.gov/library/nho/nho.asp?year=2007");
                    $dom = new DomDocument;
                    $dom -> loadHTML($html);
                    

                    i get a TON of entity mismatch errors within the document b/c it isnt a compliant site

                      Suppress the error messages with @. The document is still loaded and parsed (which it would have to be for the error messages to be generated). Most of those errors are corrected in the process. Some of the errors need to be cut down to make the code manageable. Who uses <font> tags these days? preg_replace('!</?font.*?>!','', $html). What's left should parse legibly. A lot of the grief comes from braindead constructs like <strong><font>...</strong></font>

                      Each item is in a td element that has attributes width="284" and valign="top". You're after the content of the strong element each contains. (Note I said "element", not "tag" - two different things.)

                      $itemXpath = new DOMXPath($doc);
                      $items = $itemXpath->query("//td[@width='284'][@valign='top']/strong");
                      

                      The result of that is a DOMNodeList that can be traversed.

                      $doc = new DomDocument;
                      $html = preg_replace('!</?font.*?>!','', $html);
                      @$doc->loadHTML($html);
                      
                      $itemXpath = new DOMXPath($doc);
                      $items = $itemXpath->query("//td[@width='284'][@valign='top']/strong");
                      
                      
                      for ($i = 0; $i < $items->length; $i++) {
                          echo $items->item($i)->nodeValue . "\n";
                      }
                      

                      The $items do contain more detail than is just contained in the nodeValue, but given the mess the code is to begin with, it may not be trivial to separate the date from the title.

                        awesome dude... ill give it thee ol' college try when i get home this evening.. thanks for the direction 🙂

                          Write a Reply...