Hi Guys,

I didn't know if this was best here or in the database forum. Hopefully I have chosen well.

I am in the first stages of working on a RSS Feed site. Basically, it's going to aggregate hundreds of RSS feeds into channels of similarly related blogs. I think that the easiest thing would be to set up a cron to read all of the RSS feeds and import any new articles into the database.

I have code already to create my own RSS feeds from there on.

What I want to know, and I've googled for this a bit but can't seem to find any answers, is there a way, or would I have to write a function, to parse the RSS feeds and then import them into the MySQL database?

Any suggestions, pointers or advise would be excellent.

    Well assuming you are able to gain the rss feeds in plain text then yes you will be able to save them to your MYSQL database using the MYSQL BLOB format.

    Its easier to configure if you import just the plaintext info into your database.

    If tags are resent simply just save the feed as a variable and then strip all tags before saving the info wanted.

    Using cron jobs to check for new items will work but see if the rss resource you are using has any form of ID ie. rss.php?id=87362
    Will be faster if you just search for new items higher than your last database entry rather than comparing your entire database everytime to all their items.
    You will then need to add their id's in a row in your database.

    Any questions on what I just said please feel free.
    Brandon

      It would probably be better to have your database mimick that of the feed. For example, let's take the PHPBuilder RSS Feed (did you know we had one 😉 )

      <?xml version="1.0"?>
      <rss version="0.91">
       <channel>
        <pubDate>Sat, 27 Oct 2007 2:03:42 GMT</pubDate>
        <description>Newest Help Forum Posts On PHPBuilder.com</description>
        <link>http://phpbuilder.com/</link>
        <title>PHPBuilder.com Newest Help Forum Posts</title>
        <webMaster>staff@phpbuilder.com</webMaster>
      
        <language>en-us</language>
        <item>
         <title>Checking for Fake emails?</title>
         <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346726</link>
         <description>...</description>
        </item>
        <item>
      
         <title>Warning Function Include</title>
         <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346725</link>
         <description>...</description>
        </item>
        <item>
         <title>problem with php and google analytics</title>
         <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346722</link>
      
         <description>...</description>
        </item>
        <item>
         <title>help shorten code</title>
         <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346720</link>
         <description>...</description>
        </item>
      
        <item>
         <title>Password Protected Directory</title>
         <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346717</link>
         <description>...</description>
        </item>
        <item>
         <title>Cant use the back button in the browser in these forums anymore</title>
      
         <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346711</link>
         <description>...</description>
        </item>
        <item>
         <title>Why won't this *simple* session code work?</title>
         <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346707</link>
         <description>...</description>
      
        </item>
        <item>
         <title>gzip advanced implementation considerations</title>
         <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346704</link>
         <description>...</description>
        </item>
        <item>
      
         <title>Win 2003 IIS6 PHP_AUTH_USER blank!</title>
         <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346701</link>
         <description>...</description>
        </item>
        <item>
         <title>$_GET and retrieveing info passed from html</title>
         <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346694</link>
      
         <description>...</description>
        </item>
        <item>
         <title>Use of session_start()</title>
         <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346686</link>
         <description>...</description>
        </item>
      
        <item>
         <title>Javascript onclick loads up multiple drop down menus ??</title>
         <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346684</link>
         <description>...</description>
        </item>
        <item>
         <title>[RESOLVED] Fatal error when calling a nested function</title>
      
         <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346681</link>
         <description>...</description>
        </item>
        <item>
         <title>Removing return</title>
         <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346680</link>
         <description>...</description>
      
        </item>
        <item>
         <title>Need a simple free FAQ/Knowledge base</title>
         <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346668</link>
         <description>...</description>
        </item>
        <item>
      
         <title>[RESOLVED] Deleting Files and Folders in Windows Environment</title>
         <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346664</link>
         <description>...</description>
        </item>
        <item>
         <title>php header redirect = error in Internet Explorer</title>
         <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346656</link>
      
         <description>...</description>
        </item>
        <item>
         <title>[RESOLVED] Continuing on List of Subdirectories Post Below</title>
         <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346655</link>
         <description>...</description>
        </item>
      
        <item>
         <title>date, time string</title>
         <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346648</link>
         <description>...</description>
        </item>
        <item>
         <title>Help needed...</title>
      
         <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346647</link>
         <description>...</description>
        </item>
      </channel>
      </rss>

      Now, each there's the root <rss> node with a child <channel> node and multiple grand-children <item> nodes. Now, each <item> has 3 nodes: title, link, description. For our purposes, we'll assume description isn't really just "..." but the first 50 words of the post. We're also going to add that each <item> node has a 4th child called "date" which is the date of that thread's creation.

      Now, you'd want to set your database up to mimick the data in the rss feed. Our database would have 5 columns:

      |  id  |            title            |                    description                     |    date    |     link     |
      +------+-----------------------------+----------------------------------------------------+------------+--------------+

      Now, when we retrieve the feed, we'd then parse it so that the title, description, date and link would be inserted as separate items, one for each <item> node in the feed. The ID field is nothing more than auto-incremented ID field to keep things "kosher".

      Now, this way the RSS feed is now "searchable" in your database. You can search by date, by topic, or by description. And you can also make the title a key and when you go to insert into the database, you can UPDATE on a duplicate key 😉

      This is probably a more robust idea than what you wanted, but it would server you better than storing a text-file in a blob section.

        3 years later
        bpat1434;10831387 wrote:

        It would probably be better to have your database mimick that of the feed. For example, let's take the PHPBuilder RSS Feed (did you know we had one 😉 )

        <?xml version="1.0"?>
        <rss version="0.91">
         <channel>
          <pubDate>Sat, 27 Oct 2007 2:03:42 GMT</pubDate>
          <description>Newest Help Forum Posts On PHPBuilder.com</description>
          <link>http://phpbuilder.com/</link>
          <title>PHPBuilder.com Newest Help Forum Posts</title>
          <webMaster>staff@phpbuilder.com</webMaster>
        
          <language>en-us</language>
          <item>
           <title>Checking for Fake emails?</title>
           <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346726</link>
           <description>...</description>
          </item>
          <item>
        
           <title>Warning Function Include</title>
           <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346725</link>
           <description>...</description>
          </item>
          <item>
           <title>problem with php and google analytics</title>
           <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346722</link>
        
           <description>...</description>
          </item>
          <item>
           <title>help shorten code</title>
           <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346720</link>
           <description>...</description>
          </item>
        
          <item>
           <title>Password Protected Directory</title>
           <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346717</link>
           <description>...</description>
          </item>
          <item>
           <title>Cant use the back button in the browser in these forums anymore</title>
        
           <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346711</link>
           <description>...</description>
          </item>
          <item>
           <title>Why won't this *simple* session code work?</title>
           <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346707</link>
           <description>...</description>
        
          </item>
          <item>
           <title>gzip advanced implementation considerations</title>
           <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346704</link>
           <description>...</description>
          </item>
          <item>
        
           <title>Win 2003 IIS6 PHP_AUTH_USER blank!</title>
           <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346701</link>
           <description>...</description>
          </item>
          <item>
           <title>$_GET and retrieveing info passed from html</title>
           <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346694</link>
        
           <description>...</description>
          </item>
          <item>
           <title>Use of session_start()</title>
           <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346686</link>
           <description>...</description>
          </item>
        
          <item>
           <title>Javascript onclick loads up multiple drop down menus ??</title>
           <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346684</link>
           <description>...</description>
          </item>
          <item>
           <title>[RESOLVED] Fatal error when calling a nested function</title>
        
           <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346681</link>
           <description>...</description>
          </item>
          <item>
           <title>Removing return</title>
           <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346680</link>
           <description>...</description>
        
          </item>
          <item>
           <title>Need a simple free FAQ/Knowledge base</title>
           <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346668</link>
           <description>...</description>
          </item>
          <item>
        
           <title>[RESOLVED] Deleting Files and Folders in Windows Environment</title>
           <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346664</link>
           <description>...</description>
          </item>
          <item>
           <title>php header redirect = error in Internet Explorer</title>
           <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346656</link>
        
           <description>...</description>
          </item>
          <item>
           <title>[RESOLVED] Continuing on List of Subdirectories Post Below</title>
           <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346655</link>
           <description>...</description>
          </item>
        
          <item>
           <title>date, time string</title>
           <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346648</link>
           <description>...</description>
          </item>
          <item>
           <title>Help needed...</title>
        
           <link>http://www.phpbuilder.com/board/showthread.php?threadid=10346647</link>
           <description>...</description>
          </item>
        </channel>
        </rss>

        Now, each there's the root <rss> node with a child <channel> node and multiple grand-children <item> nodes. Now, each <item> has 3 nodes: title, link, description. For our purposes, we'll assume description isn't really just "..." but the first 50 words of the post. We're also going to add that each <item> node has a 4th child called "date" which is the date of that thread's creation.

        Now, you'd want to set your database up to mimick the data in the rss feed. Our database would have 5 columns:

        |  id  |            title            |                    description                     |    date    |     link     |
        +------+-----------------------------+----------------------------------------------------+------------+--------------+

        Now, when we retrieve the feed, we'd then parse it so that the title, description, date and link would be inserted as separate items, one for each <item> node in the feed. The ID field is nothing more than auto-incremented ID field to keep things "kosher".

        Now, this way the RSS feed is now "searchable" in your database. You can search by date, by topic, or by description. And you can also make the title a key and when you go to insert into the database, you can UPDATE on a duplicate key 😉

        This is probably a more robust idea than what you wanted, but it would server you better than storing a text-file in a blob section.

        thanks for the useful post, appreciate it since it is something I will need.
        For the service I intend to create the rss feeds will actually be initially submitted to the system by the publisher and not imported, does however the rest of the principals mention above stay the same or is there any difference?

          Yeah, I would think so. Depends on the end result you're looking for.

            bpat1434;10970297 wrote:

            Yeah, I would think so. Depends on the end result you're looking for.

            The end result would be to display the findings ie. url, meta description [thumbnails, tags, etc if possible] seperate threads like you would find in a news aggregator.

            One option is to manually submit/collect ie. like reddit, delicious, etc. Then the option is to import via the rss feeds.

              If you need to cache the data, then yes, you're fine. The only issue you may run in to is that some sites don't output RSS but instead use Atom which uses a different architecture and required data. But if you read the specs on RSS and Atom you should be able to define a database schema that allows for any of the fields.

                Write a Reply...