I have been working with simple_html_parser trying to parse out the financial tables from yahoo. I have made pretty good progress (it doesn't look like much, but it took me awhile) and I have been able to seperate it all out.

However, whenever I try to do an "echo $b[1];" I get an error saying "Fatal error: Cannot use object of type simple_html_dom_node as array in C:\wamp\www\test\index3.php on line 28"

What I would like to do is take each element (financial number) and store it in my MySql. I believe I could do that if I were able to seperate out the different numbers, I would then be able to insert them into the database.. Let me know if I could be more clear on what I am trying to do.. Thanks!

(the code below works)

include('simple_html_dom.php');

// get DOM from URL or file

$html = file_get_html('http://finance.yahoo.com/q/is?s=NEM&annual');

// find all table data with class "yfnc_tabledata1"

foreach($html->find('table.yfnc_tabledata1 td') as $e)

foreach($e->find('tr') as $a)

foreach($a->find('td') as $b)

echo $b.'<br />';

    Presumably what you've got now gives you an output. Can you show us the output?

    As it is, what you need to do is build yourself a database INSERT statement based on the values you're reading. Looking at the source of that page I'm guessing the first values you're pulling off are the column headings so you'll need to ignore them.

    But in the foreach loop you'll need to start building your insert statements and then send them into the database.

      Thanks for replying.. been getting frustrated b/c I haven't been able to figure this out..
      I think I see what your saying. But where I get confused is how I put each value into the database...
      This is the only thing i can come up with (Maybe not correct php .. but just trying to get right outline):

      foreach($e as $number)
      {
      INSERT INTO financials
      VALUES('$number');
      }
      

      this is the output of the above code that was requested (sorry its long).. thanks again for your help!
      PERIOD ENDING
      31-Dec-09
      31-Dec-08
      31-Dec-07
      Total Revenue
      7,705,000

      6,199,000

      5,526,000

      Cost of Revenue
      3,049,000

      3,358,000

      3,155,000

      Gross Profit
      4,656,000

      2,841,000

      2,371,000


      Operating Expenses

      Research Development
      322,000

      166,000

      62,000

      Selling General and Administrative
      576,000

      144,000

      244,000

      Non Recurring
      7,000

      478,000

      1,753,000

      Others
      806,000

      798,000

      703,000



      Total Operating Expenses
      1,711,000

      1,586,000

      2,762,000


      Operating Income or Loss
      2,945,000

      1,255,000

      (391,000)


      Income from Continuing Operations

      Total Other Income/Expenses Net
      88,000

      123,000

      144,000

      Earnings Before Interest And Taxes
      3,033,000

      1,378,000

      (247,000)

      Interest Expense
      120,000

      102,000

      105,000

      Income Before Tax
      2,913,000

      1,276,000

      (352,000)

      Income Tax Expense
      788,000

      113,000

      200,000

      Minority Interest
      (796,000)
      (329,000)
      (410,000)



      Net Income From Continuing Ops
      2,109,000

      829,000

      (963,000)


      Non-recurring Events

      Discontinued Operations
      (16,000)
      24,000

      (923,000)

      Extraordinary Items


      -



      Effect Of Accounting Changes


      -



      Other Items


      -




      Net Income
      1,297,000

      853,000

      (1,886,000)

      Preferred Stock And Other Adjustments


      -



      Net Income Applicable To Common Shares
      $1,297,000

      $853,000

      ($1,886,000)

        Okay, you've clearly got a lot redundant data you're passing through.

        I'd like to stress right now I've never done what you're doing and you may not be using the most efficient method of taking the data from the table. Are you sure you can't get the information via a CSV form from their website like this one:
        http://ichart.finance.yahoo.com/table.csv?s=MSFT&a=0&b=1&c=2008&d=0&e=1&f=2009&g=d&ignore=.csv

        (From a book I have on Silverlight)

        I'd suggest you need to create an array first to store the data you're stripping out and then a function to parse the value you've assigned to $b and then return it to put into the array. So something like:

        include('simple_html_dom.php'); 
        
        // get DOM from URL or file 
        
        $html = file_get_html('http://finance.yahoo.com/q/is?s=NEM&annual'); 
        
        // find all table data with class "yfnc_tabledata1" 
        
        $store_arr = array();
        $counter = 0;
        
        foreach($html->find('table.yfnc_tabledata1 td') as $e) 
        
        foreach($e->find('tr') as $a) 
        
        foreach($a->find('td') as $b) 
        {
            $result = parse_value($b);
            if ($result != '')
                $store_arr[$counter++] = $result;
        }
        
        

        Your parse_value() function would ignore anything it couldn't turn into a number and instead return '' thus allowing you to build a store_arr without text.

        Then you'd use the $store_arr much as you put up there to give you values to insert into your database.

          I have searched quite a bit for some kind of download button for the financials but I haven't found anything yet. Is there a more efficient method to use than this? I am all ears b/c I just can't figure this out. I can't thank you enough for your help though... nobody else on any other forum is even trying lol...

          When I tried out the code you provided I got "Fatal error: Call to undefined function parse_value() in C:\wamp\www\test\index3.php on line 22" IDK why but it seems like it wont read the DOM as an array.

          I have been doing this all on excel but it requires a bit of work.
          I have attached the MACRO2.doc but I can't attach the excel charts. The Macro2.doc is the actual macro that I just ctrl + a then ctrl+ f and change the stock symbol to the one I want and add it to the VB editor in excel. It pulls the financials from yahoo and puts them into a premade worksheet called stock.xls.
          After it retrieves the data, it saves as the stock symbol and closes the document. The NEM.xls is just an example of the many that I have made. Thats all I had for awhile until I thought of an idea to create a worksheet that looks at all the financials in the one industry and tells me what the ratios for each company are so i dont have to open up each individual worksheet.

          However, it requires me to edit each macro for the symbol I want, and the information becomes outdated every quarter when financials are released.

            You're being told parse_value() isn't defined because you haven't defined it yet! I was just showing you the sort of thing you'd want to do.

            Writing such a function will be a headache. You'll probably need to use a few different functions to determine which cells are the cells you want.

            I don't know of a better way but try googling / searching on here for 'HTML scraping' or 'scraping HTML pages' or 'scraping HTML into tables' maybe?

              8 years later

              <?php

              include_once "simplehtmldom/simple_html_dom.php";

              // Create DOM from URL or file
              $html = file_get_html('link');

              echo $html->find("a", 1);

              ?>

                Write a Reply...