Before doing this you have to find out if you really need the possible performance gain from storing it as XML. A page with many hits that takes time to load would gain from it, otherwise I would say that it won't be worth the effort. Even it if is a busy site it might be better to just use a faster server, or even servers. Hardware is cheaper than development time.

If you decide to generate XML-files from the database I suggest that you generate XHTML-files at once, thus just useing it directly as the page. Then you don't have to do any PHP or SQL work when displaying the page.

As for using XML as an archive I think it is a really bad idea. Better to use the database, that way you can easily find what you are looking for. With an XML archive you have to look through lots of files, and parse them in order to find something.

I don't know what size you are talking about when you say "and the smaller DB size should speed up the existing queries too". Just make sure that you have the database tables indexed correctly and it should not be any problem. After all, databases are used with millions of records without problems.

    [FONT="Verdana"]
    The application involved here is YAIC (Yet Another Internet Calendar). But the one I built beats anything that Google or Yahoo has and has commercial value.

    The queries never change per page. If you want a full month of calendar event data, this is what is returned. Or just a weeks worth. Or just the calendar events for today. So, based upon what you stated, I think my idea may have some value for further exploration.

    Also, I am on a vitural server, hosted by a company that will go nameless, and their server speeds can vary widely. Sometimes my result set is returned in just a few seconds. And sometimes the exact same query can take a full 30 seconds. (Not too much I can do about that, short of setting up my own server.)

    And finally, do you have any recommendations for PHP5 + XML tuts? Amazon doesn't show much beyond 3 or 4 titles (last time I looked anyway).

    [/FONT]

      And finally, do you have any recommendations for PHP5 + XML tuts?

      If you already know XML, just read the PHP manual concerning the SimpleXML extension. You should be able to figure out what to do after reading about the functions available, and taking a look at the examples provided.

        Stinger51 wrote:

        Also, I am on a vitural server, hosted by a company that will go nameless, and their server speeds can vary widely. Sometimes my result set is returned in just a few seconds. And sometimes the exact same query can take a full 30 seconds. (Not too much I can do about that, short of setting up my own server.)

        Change host. Or use a dedicated server.

        Oh, and please do type in the size and font that is standard. Everyone here can read it, or set our browsers to scale it up.

          Stinger51 wrote:

          [FONT="Verdana"] Sometimes my result set is returned in just a few seconds. And sometimes the exact same query can take a full 30 seconds. (Not too much I can do about that, short of setting up my own server.)
          [/FONT]

          Sounds to me that your queries are in serious need of a review and or your database requires a redesign (Unless you have 100K+ rows in your tables)

            leatherback wrote:

            Sounds to me that your queries are in serious need of a review and or your database requires a redesign (Unless you have 100K+ rows in your tables)

            [FONT="Verdana"]

            Well, that is a good assumption. But considering a table I have that is regularly queried only consists of about 25 fields of data, and the DB contains at the present about 7,000 records, I don't see that as an issue.

            BTW, I'm an old MS Access programmer, and have found the process of normalizing a MySQL DB somewhat difficult as you have to write all those relationships by hand. Nevertheless, some of the least normalized DB tables are the fastest in terms of query speed. So, I haven't taken up the challenge of MySQL normalization as yet. Someday, maybe. :-)


            [/FONT]

              laserlight wrote:

              If you already know XML, just read the PHP manual concerning the SimpleXML extension. You should be able to figure out what to do after reading about the functions available, and taking a look at the examples provided.

              [FONT="Verdana"]Well, I don't know much about XML at this point. And I have looked at some of the coding examples but find them obtuse on many points. (But I will keep looking).

              One of the pleasant surprises I've had is in working with the function:

              $xml = simplexml_load_file($url);

              With this function I have been able to parse the NOAA RSS current weather feed and summarize it into a nice table for one of my web pages. I love that function!

              [/FONT]

                Stinger51 wrote:

                Well, that is a good assumption. But considering a table I have that is regularly queried only consists of about 25 fields of data, and the DB contains at the present about 7,000 records, I don't see that as an issue.

                BTW, I'm an old MS Access programmer, and have found the process of normalizing a MySQL DB somewhat difficult as you have to write all those relationships by hand. Nevertheless, some of the least normalized DB tables are the fastest in terms of query speed.

                This response actually confirms my ideas. If your 'least normalized tables are the fastest' that clearly shows that hte way you query is the problem..

                In general, if you have queries on this size database, which require more than a split second, you need to look at the logic in the queries. Moving to XML to avoid slow queries is NOT going to result in a solution for you. Cleaning up the queries will.

                Just for the fun of it: Do you know which query is the slowest? Could we have a look at it & have a sta at improving it? Could posta relevant piece of code, and some info on the tables?

                  Nevertheless, some of the least normalized DB tables are the fastest in terms of query speed. So, I haven't taken up the challenge of MySQL normalization as yet. Someday, maybe. :-)

                  It is true that the lack of normalisation can speed up some queries. On the other hand, joins are fast when the join columns are indexed, so that is generally not a problem, whereas the lack of normalisation can make the extraction of data rather awkward in some cases, as you might have experienced yourself. As such, I suggest that you normalise your database as soon as possible.

                    leatherback wrote:

                    This response actually confirms my ideas. If your 'least normalized tables are the fastest' that clearly shows that hte way you query is the problem..

                    In general, if you have queries on this size database, which require more than a split second, you need to look at the logic in the queries. Moving to XML to avoid slow queries is NOT going to result in a solution for you. Cleaning up the queries will.

                    Just for the fun of it: Do you know which query is the slowest? Could we have a look at it & have a sta at improving it? Could posta relevant piece of code, and some info on the tables?

                    [FONT="Verdana"]
                    Yes, I will be glad to do that. But I need to think about how to present all this information as to not bore the average reader to sleep here. I'll get back to you with an example query that hits my "Public Events" table in my Calendar DB.

                    Until then, please note that this table is :
                    - the largest table in the DB,
                    - is not related to any other table (nor need be),
                    - has about 25 relevant fields
                    - cannot be updated by the general pubic (is read-only) because it is a calendar of events table.




                    [/FONT]

                      Just show us the query routine used. I THINK you might actually be better of showing us linked table queries. A simple query on your table with 750 entries in it should be split second.. ?

                      really curious as to which complex query you have there, and hope fo you that I am wrong (Would be so sad if your code could be improved massively by querying differrently!)

                      Make sure you place them in

                       [ /php] tags for reading clarity
                        laserlight wrote:

                        It is true that the lack of normalisation can speed up some queries. On the other hand, joins are fast when the join columns are indexed, so that is generally not a problem, whereas the lack of normalisation can make the extraction of data rather awkward in some cases, as you might have experienced yourself. As such, I suggest that you normalise your database as soon as possible.

                        [FONT="Verdana"]

                        Thanks for the suggestion and your kind input.

                        The issue is, due to the nature of calendar content mgmt. systems, most data (in my D😎 is not related. IOW, the result set for today's calendar events (or this week's events) has nothing to do with whom is requesting it.

                        My calendar program presents data on 3 different tiers:

                        • Public, e.g. pubic events such as seminars, fund raisers, musical events;
                        • Company, e.g. All-hands meetings, company picnics, new hires;
                        • Employees, e.g. each employee is encouraged to share their work schedules

                        The latter two categories require a log-in account, which is not available to the general public.

                        Thus, no "relations" exist among my public customers. Therefore, there is little or no need to "normalize" my tables of events, as they are not related. They are simply repositories of calendar event data: Public, Company, and Employee events.

                        One scheme that has been suggested to me (since I began this thread) is to use a flat file storage system that would be presented to the general user, rather than the overhead of a DB query, when requesting Public Event data. And this flat file would be updated as necessary, as the sys admin adds or edits the existing data. Which seems to make sense to me. Sort of.

                        What say you?

                        [/SIZE][/FONT]

                          I think that
                          1) Every database system has a need for normalization
                          2) If there is no relation between the tables, there is not need for relations
                          3) If you have repeated information in any of your tables, there may be the need for splitting into multiple tables
                          4) Independent of whether 1,2, 3, .. n persons are adding data to the database, normalization is required
                          5) A flat file could work, but if you have a lot of different ways to present the data, a database would make sense.

                            leatherback wrote:

                            I think that
                            1) Every database system has a need for normalization
                            2) If there is no relation between the tables, there is not need for relations
                            3) If you have repeated information in any of your tables, there may be the need for splitting into multiple tables
                            4) Independent of whether 1,2, 3, .. n persons are adding data to the database, normalization is required
                            5) A flat file could work, but if you have a lot of different ways to present the data, a database would make sense.

                            [FONT="Verdana"]
                            Here is the layout for one of my tables (sorry for the bad formatting):

                            RecNum int(255) unsigned (NULL) NO PRI (NULL) auto_increment select,insert,update,references

                            CompanyName varchar(25) latin1_swedish_ci YES (NULL) select,insert,update,references

                            CompanyCode varchar(20) latin1_swedish_ci YES (NULL) select,insert,update,references

                            EventDate date (NULL) YES (NULL) select,insert,update,references

                            StartTime time (NULL) YES (NULL) select,insert,update,references

                            EndTime time (NULL) YES (NULL) select,insert,update,references

                            EventAbbr varchar(15) latin1_swedish_ci YES (NULL) select,insert,update,references

                            EventDescription varchar(120) latin1_swedish_ci YES (NULL) select,insert,update,references

                            AddedText varchar(120) latin1_swedish_ci YES (NULL) select,insert,update,references

                            FontColor varchar(35) latin1_swedish_ci YES 0 select,insert,update,references

                            Priority tinyint(1) unsigned (NULL) YES 3 select,insert,update,references

                            isSpecialEvent tinyint(1) (NULL) YES 0 select,insert,update,references

                            SpecialEventTitle varchar(60) latin1_swedish_ci YES (NULL) select,insert,update,references

                            isLink tinyint(1) (NULL) YES 0 select,insert,update,references

                            LinkPage varchar(40) latin1_swedish_ci YES (NULL) select,insert,update,references

                            ArtWork varchar(40) latin1_swedish_ci YES (NULL) select,insert,update,references

                            MapLink tinyint(1) unsigned (NULL) YES 0 select,insert,update,references

                            Street varchar(30) latin1_swedish_ci YES (NULL) select,insert,update,references

                            ZipCode varchar(10) latin1_swedish_ci YES (NULL) select,insert,update,references

                            DateAdded datetime (NULL) YES (NULL) select,insert,update,references

                            DateEdited datetime (NULL) YES (NULL) select,insert,update,references



                            The above table describes a typical calendar event for a given company.
                            And here is a typical query for getting all the calendar events for "Ajax Co." for, say June, where $date_1 == 2008-06-01 and $date_2 == 2008-06-30:


                            $sql = "SELECT * FROM $table_name
                            WHERE CompanyCode = \"$CompanyCode\"

                            AND EventDate >= \"$date_1\"
                            AND EventDate <= \"$date_2\"

                            ORDER BY EventDate, StartTime, EventAbbr";


                            A typical result set may contain around 100 records, depending upon how many events that company has for the month. The EventAbbr (shorthand description of the event) is displayed on the calendar as a link. The user can click the link for more detailed information for that event (EventDescription and AddedText fields).

                            Again, nothing too fancy here. Just a straight forward, single table query, with the RecNum field serving as the primary key. Perhaps the only redundant data is the CompanyName field. Now that I am looking at it, this should be placed in a separate table, using the CompanyCode as the PK.

                            [/FONT]

                              hm.. and you say that this takes seconds to run?

                              I could see a few small changes that might improve the speed, but probably not many.
                              - Is company code indexed? It should be.
                              - You could consider combining start date & start time into a timestamp field, and putting the end-date in one too. That would remove one sorting option.
                              - How often do you have two events on the exact same moment? If basically never: You could remove the order by Event Abbr too.

                              You could consider putting comp. name and abbrev. in a separate table with an ID column, and using the ID in the event table. But that would not make a very big difference, although it would be proper database normalization.

                              If you want I could have a peek at the routines and all, but not this week. PM me if you are interested in pursuing this further?

                                It doesn't seem like the database query should be able to slow the script down, not by more than a few milliseconds at least.

                                I suggest that you put a lot of microtime in the code to find out what the problem is, it seems like all we are doing now is guessing. Just put microtime in strategic places in the code (start, end, before the query, after the query, after the loop handling the results and so on) and you should be able to find out what part of the code is the problem.

                                  Write a Reply...