PHP Paginating Logic: SQL result vs LIMIT clause , does it really make a difference?

Tea_J · Apr 30, 2013

Hi Guys

The standard to do pagination would probably be using the LIMIT 0,100 clause and just do logic to calculate what LIMIT clause to set based on current page being handled.. but this means issuing 2 queries of almost the exact same sql statement, except the first one is to do a count(*) rather than fetching the rows.. .. this can be quite tedious when dealing w/ dynamically generated SQL statements (based on filters, user selections and what nots) .. and worst, should the statement involve a GROUP BY clause, there seems to be no way to do a proper row count w/o first fetching all the rows as is with the select statement.

The other way, w/c i found to be much more convenient, is to simply execute the select statement as is, w/o limits.. and just count the returned rows and do the pagination logic from there.. this means 1 query to execute and it will always be accurate.. this is the way i'm doing things at the moment (dunno about other pagination scripts out there)

The only thing about this technique though is the fact that the system needs to fetch ALL rows first and load into memory.. w/c im not sure if it's a really bad thing or something ok and acceptable.

What are your thoughts?

laserlight · Apr 30, 2013

Tea_J wrote:
The standard to do pagination would probably be using the LIMIT 0,100 clause and just do logic to calculate what LIMIT clause to set based on current page being handled.. but this means issuing 2 queries of almost the exact same sql statement, except the first one is to do a count(*) rather than fetching the rows..

Yes, that is a typical way of doing pagination. However, in some cases it may be feasible to select rows by some other means, e.g., for a previous/next links (or scrolling, in the case of say, a mobile app), it can be more efficient to make use of say, the fact that the integer primary keys of the next page of results follow after the primary key of the last result on the current page.

Tea_J wrote:
this can be quite tedious when dealing w/ dynamically generated SQL statements (based on filters, user selections and what nots) ..

I don't see how it is tedious: you don't have to construct them all over again.

Tea_J wrote:
and worst, should the statement involve a GROUP BY clause, there seems to be no way to do a proper row count w/o first fetching all the rows as is with the select statement.

If necessary, use a subquery.

Tea_J wrote:
The other way, w/c i found to be much more convenient, is to simply execute the select statement as is, w/o limits.. and just count the returned rows and do the pagination logic from there.. this means 1 query to execute and it will always be accurate.. this is the way i'm doing things at the moment (dunno about other pagination scripts out there)

The only thing about this technique though is the fact that the system needs to fetch ALL rows first and load into memory.. w/c im not sure if it's a really bad thing or something ok and acceptable.

What are your thoughts?

I would be wary where large result sets are concerned.

dalecosp · Apr 30, 2013

Tea_J;11027391 wrote:
What are your thoughts?

Other than instruct someone to smack you with a LART?

My thought is that it's good to think about scale. As Laserlight implies, lots of things change when you move from smaller datasets to larger ones. Having now had a couple years experience with larger datasets and poorly conceived SQL, I'd advise you to consider why "the standard way" IS "the standard way" before you start doing things like "fetch ALL rows first" ...

My $.02,

bradgrafelman · Apr 30, 2013

dalecosp;11027399 wrote:
I'd advise you to consider why "the standard way" IS "the standard way" before you start doing things like "fetch ALL rows first" ...

Agreed. It wouldn't feel right for me to execute code that essentially says:

"Retrieve all 1.3 million rows from the result set into my memory space, seek ahead 500,000 rows, allow me to loop through 100 rows, then discard the other 1.2999 million rows since I don't care about them."

Weedpacket · Apr 30, 2013

The question: "does it really make a difference?" is sort of one of those ones where if you have to ask the answer must be 'no'
If the performance is a real pain then it would be nice to speed things up a bit.

bradgrafelman wrote:
"Retrieve all 1.3 million rows from the result set into my memory space, seek ahead 500,000 rows, allow me to loop through 100 rows, then discard the other 1.2999 million rows since I don't care about them."

Google of course just makes a guess based on additional statistics that are maintained as records are added (e.g., 0.013% of all records have the word 'smudge', 0.027% have 'kickboxing'; so maybe about .0035% have both - but in reality there's more to it than that). Google's collection is so large though that they don't need to be accurate, because no-one is going to be looking through "about 6 million results" - and in reality they don't bother returning more than a thousand.

Oh, and I note that MySQL has a FOUND_ROWS function. (The fact that it can offer this implies that it doesn't use the LIMIT clause in its query planner).

Tea_J · May 1, 2013

So with the standard SQL LIMIT approach, obviously this is the best way to go , no doubt.. except for the queries that require GROUP BY and other clauses that can not be issued w/ a simple COUNT w/o having to fetch all rows in the said query first.. sub query and found rows all have need extra load from the server, specially SUB QUERYING as i recently found out w/ another problem.. borrowing bradgrafeliman's words and tweaking it:

SUB QUERY
QUERY # 1: "count the rows from the result of the following operation: ( Query # 2) Retrieve all 1.3 million rows from the result set into my memory space, then discard all the results"
QUERY # 3: "Retrieve again the rows needed but this time return 100 rows only if available)"

FOUND ROWS
this seems to be the most elegant but as documented: "If you are using SELECT SQL_CALC_FOUND_ROWS, MySQL must calculate how many rows are in the full result set. However, this is faster than running the query again without LIMIT, because the result set need not be sent to the client. "

So this is still an extra load on the server and I dont know if it is half as bad as retrieving all 1.3million records in the first place .. the SQL_CALC_FOUND_ROWS also has been known to slow down select queries.
http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/

Anway i think ill design my routine the following way:
1) dynamically modify the SELECT statement to use COUNT(*) instead of doing a full select, using regex.. as i have asked for help here:
http://board.phpbuilder.com/showthread.php?10389563-Regular-Expression-for-extracting-a-specific-portions-on-a-mySQL-Select-statement&p=11027389#post11027389
but gotten no reply

2) should the SELECT statement have the word "GROUP BY" then do not offer pagination, only offer next and previous options.

i think this will be all i need for the most part.. and perhaps throw an extra switch to do the SQL_CALC_FOUND_ROWS thing if needed.. w/c i believe would be a rare requirement.

laserlight · May 1, 2013

Tea_J wrote:
borrowing bradgrafeliman's words and tweaking it:

SUB QUERY
QUERY # 1: "count the rows from the result of the following operation: ( Query # 2) Retrieve all 1.3 million rows from the result set into my memory space, then discard all the results"
QUERY # 3: "Retrieve again the rows needed but this time return 100 rows only if available)"

You may have borrowed the words, but you failed to adjust the context.

The results retrieved from Query #2 could be very different from what you have in mind, i.e., the database engine could be smart enough to only retrieve primary keys, or perform some other optimisation. In your proposed scenario, you would be retrieving the data that you want to use as well, which could be substantially more.
Unless you happen to have the webserver and the database server on the same machine, the retrieval of the rows will not be into your memory space. Consequently, only the final result, i.e., the row count and the rows that you actually want, would be sent across the network and into your memory space.

Tea_J wrote:
So this is still an extra load on the server and I dont know if it is half as bad as retrieving all 1.3million records in the first place .. the SQL_CALC_FOUND_ROWS also has been known to slow down select queries.

Obviously it will slow down select queries: it has to do more work. If you don't know but need to know in order to choose, then test and find out.

Tea_J wrote:
dynamically modify the SELECT statement to use COUNT(*) instead of doing a full select, using regex..

Wouldn't it make more sense to dynamically construct the SQL statement?

Tea_J · May 1, 2013

i see your points very well on both accounts. thanks

Wouldn't it make more sense to dynamically construct the SQL statement?

newp, not for my case.. you see i have a function for all my select queries.. and this integrates with a lot of my other stuffs just fine.

query_select($q)

where $q is the SQL statement..

the $q is the SELECT query ... and a lot of times this is dynamically constructed.. after the statement is prepared, i basically pass the complete select query to the function.

now instead of me having to code the pagination logic and construct an SQL for each select routines i have, i want to simply use the same query_select() that i already have , pass whatever SELECT statement at it, and it will be smart enough to do pagination for me.

query_select($q,$rowlimit,$curpage)

so, all i need now is a REGEXP for my query_select function to extract the needed conditions to do a simple Count..

query select function will first check for keywords "group by" Distinct and all that.. if they dont exist then REGEXP will fire and extract the needed conditions and build the count query. do full pagination.
if the keywords do exist, function will then offer just a simple PREV NEXT

..later ill probably add a function to force it to do SQL_CALC_FOUND_ROWS , if i really need pagination.

Tea_J · May 1, 2013

(deleted, wrong thread)

xerxes · May 1, 2013

I was having similar thoughts recently and my solution was this:

I first store my FROM clause in a variable and construct my WHERE clause and store that in another variable.

Then I calculate the total rows by selecting just the ID field using the FROM and WHERE clauses:

$intRecFirst     = isset($_GET["FirstRecord"])	? funcStripText($_GET["FirstRecord"])	              : 0;     //First Row
$intRecTotal    = isset($_GET["TotalRecords"])	? funcStripText($_GET["TotalRecords"])	: 0;     //Total Records
$intRecPage    = 20;   // The mumber of records per page

.....

if ($intRecTotal == 0){
	$pdo = $connDB->prepare("SELECT ID FROM" . $strSQLFROM . $strSQLWHERE);   // SELECT just the ID field
	$pdo->execute();
	$intRecTotal = $pdo->rowCount();
	$pdo->closeCursor();
}

I tried SELECT COUNT(ID), but found that using rowCount() was marginally faster.

Also, I'm only doing this when the page is first loaded, or if the filter/search parameters change, thereafter I pass $intRecTotal along with the URL paramaters, for example:

echo '<a href="Page.php?FirstRecord=' . $intRecFirst . '&TotalRecords=' . $intRecTotal . '">Next Page</a>'

Then I'm re-using the FROM and WHERE clauses in the SELECT statement that retrieves the data I actually want to display on the page:

$pdo = $connDB->prepare("SELECT ID, Something, SomethingElse FROM " . $strSQLFROM . $strSQLWHERE . " LIMIT " . $intRecFirst . "," . $intRecPage . ";");
$pdo->execute();
$arrView = $pdo->fetchAll();
$pdo->closeCursor();
$connDB = null;

If there are a really large number of records, you could perhaps extend this to put a limit on first SELECT statement as well, because if we have 20 records per page we don't really expect someone to page through more than 20, 30 pages? Instead you'd expect them to use the search/filter to further narrow the results before paging through them, something like this:

if ($intRecTotal == 0){
	$pdo = $connDB->prepare("SELECT ID FROM" . $strSQLFROM . $strSQLWHERE . " LIMIT 0, 1000");   // SELECT just the ID field
	$pdo->execute();
	$intRecTotal = $pdo->rowCount();
	$pdo->closeCursor();
}

Then on the page instead of displaying something like "Records 1 to 20 of 123,456", we could show something like this: "Records 1 to 20 of the first 1,000".

What do you think?

I also came across this: http://www.percona.com/files/presentations/ppc2009/PPC2009_mysql_pagination.pdf, which talks about not using offset with LIMIT, but instead using the last ID to append to the WHERE clause. However, I can't see how you would use this with records that can be re-ordered so that IDs are no longer sequential.

laserlight · May 1, 2013

xerxes wrote:
I first store my FROM clause in a variable and construct my WHERE clause and store that in another variable.

Just to check: you are making use of parameter binding in a prepared statement rather than just building that those clauses with the data embedded, right?

xerxes wrote:
I tried SELECT COUNT(ID), but found that using rowCount() was marginally faster.

How many rows are there and is the database on a separate server?

xerxes wrote:
If there are a really large number of records, you could perhaps extend this to put a limit on first SELECT statement as well, because if we have 20 records per page we don't really expect someone to page through more than 20, 30 pages? Instead you'd expect them to use the search/filter to further narrow the results before paging through them (...) Then on the page instead of displaying something like "Records 1 to 20 of 123,456", we could show something like this: "Records 1 to 20 of the first 1,000".

Fairly sound reasoning. Weedpacket noted in post #5 that Google does something similiar.

xerxes wrote:
I also came across this: http://www.percona.com/files/present...pagination.pdf, which talks about not using offset with LIMIT, but instead using the last ID to append to the WHERE clause. However, I can't see how you would use this with records that can be re-ordered so that IDs are no longer sequential.

Yes, this is the general idea of the alternative that I mentioned in post #2. As for re-ordering: how are you doing the re-ordering?

xerxes · May 1, 2013

Just to check: you are making use of parameter binding in a prepared statement rather than just building that those clauses with the data embedded, right?

Not in this instance, although I have on pages where I have INSERTS and UPDATES; I might have to look at that. I have run user input through a function to strip out a lot of potential nasties:

function funcStripTextEsc($strText){
	if (is_numeric($strText)==false){
		$arrBadWords = array("tbl_", ";", "--", "({", "/*", "xp_", "dt_", "varchar");
		$strText = str_ireplace($arrBadWords, "", $strText);

	$arrEscape = array("\0", "\b", "\n", "\r", "\t", "\\", "\%", "\_", "\Z", "'", '"');
$arrReplace = array("\\0", "\\b", "\\n", "\\r", "\\t", "\\\\", "\\%", "\\_", "\\x1a", "\'", '\"');
	$strText = str_replace($arrEscape, $arrReplace, $strText);

	$strText = htmlspecialchars($strText);
}
return $strText;
}

Note that some of these are SQL Server specific. Also, I precede all my database table names with tbl_, so that I can strip that out and prevent people doing stuff like:
DROP tbl_Users FROM... should they guess or know my table names. In addition, much of the search input is pre-defined via drop-downs and radio buttons etc.: http://host24.qnop.net/~drdev/AdBrowse.php

How many rows are there and is the database on a separate server?

Not many in this case, this a "My first PHP site" type scenario, where I'm learning PHP/MySQL having been working with ASP and SQL Server for some time.

Yes, this is the general idea of the alternative that I mentioned in post #2. As for re-ordering: how are you doing the re-ordering?

I'm allowing the user to re-order results via a drop-down, something that you might want to do often on something like an e-commerce site where you let customers order results by price, manufacturer etc.

laserlight · May 1, 2013

xerxes wrote:
I have run user input through a function to strip out a lot of potential nasties:

The problem with this approach as a general approach is that you can end up mangling valid input (and I notice that you end up storing the data with the assumption that it is HTML, since you use htmlspecialchars). Furthermore, blacklisting is always weaker than whitelisting: if you miss a "nasty", you could be in trouble. Hence, binding parameters with prepared statements provides a stronger margin of security while leaving the input intact by separating the data from the SQL statement.

xerxes wrote:
In addition, much of the search input is pre-defined via drop-downs and radio buttons etc.

This is illusory since the user can change that, e.g., by using tools like Firebug. You still must check on the server side.

xerxes wrote:
Not many in this case, this a "My first PHP site" type scenario, where I'm learning PHP/MySQL having been working with ASP and SQL Server for some time.

That could explain why you found rowCount to be marginally more efficient than a SELECT COUNT.

xerxes wrote:
I'm allowing the user to re-order results via a drop-down, something that you might want to do often on something like an e-commerce site where you let customers order results by price, manufacturer etc.

If you are allowing the re-ordering by price or something like that, then that field can then be part of the WHERE clause. In fact, I believe that the article that you read mentions a thumbs_up field that plays a part.

xerxes · May 1, 2013

laserlight;11027491 wrote:
The problem with this approach as a general approach is that you can end up mangling valid input (and I notice that you end up storing the data with the assumption that it is HTML, since you use htmlspecialchars). Furthermore, blacklisting is always weaker than whitelisting: if you miss a "nasty", you could be in trouble. Hence, binding parameters with prepared statements provides a stronger margin of security while leaving the input intact by separating the data from the SQL statement.

Hmmm, OK, perhaps I'll look into using a stored procedure for searches, I can then pass all the parameters to the stored procedure via prepared statements, then build the WHERE statement within the stored procedure, I've done this with SQL Server and it looks pretty similar.

This is illusory since the user can change that, e.g., by using tools like Firebug. You still must check on the server side.

To be honest, I didn't think that through, I'm using GET, so they'd have the opportunity to mess with the URL parameter anyway. :rolleyes:

That could explain why you found rowCount to be marginally more efficient than a SELECT COUNT.

OK, I'll have to insert a load of extra records and check that out.

If you are allowing the re-ordering by price or something like that, then that field can then be part of the WHERE clause. In fact, I believe that the article that you read mentions a thumbs_up field that plays a part.

Gotcha, so we use whatever we are ordering by within the WHERE clause, for example:

"WHERE Price > " . $lastPrice . " ORDER BY PRICE ASC LIMIT 10"

"WHERE Price < " . $lastPrice . " ORDER BY PRICE DESC LIMIT 10"

"WHERE ProductName > " . $lastName . " ORDER BY ProductName ASC LIMIT 10"

etc.

However, I can see a problem with this; it will fall to pieces if things aren't unique. Let's say we have a lot of products, it's highly likely that we will have many different products at the same price. Now if we have 30 products at £10.00, you'll only ever see the first 10, because rather than show the next 20, the WHERE clause: "WHERE Price > 10.00 ORDER BY PRICE ASC LIMIT 10" will cause the SELECT statement to skip over them to a price that is greater than £10.00. :queasy:

xerxes · May 1, 2013

Hold on, I've just re-thought that last part, if I always include the ID as part of my ORDER I can get around it:

"WHERE ID > " . $intLastID . " ORDER BY Price DESC, ID ASC LIMIT 10"

I really should read the stuff I link to properly. :o

Derokorian · May 1, 2013

Nope: because you can have IDs that are higher than the last ID but the price may still be lower (or higher), meaning you're now filtering out results, because they have an ID that's not sequentially accurate to the price. Example:

ID - Price
1 - 2.50
2 - 3.50
3 - 1.00
4 - 0.99
5 - 1.50

SELECT FROM <table> ORDER BY Price ASC LIMIT 2, gives IDs (4, 3), so page 2 would be SELECT FROM <table> WHERE ID > 3 ORDER BY Price ASC LIMIT 2 which gives IDs (4,5), you've now lost IDs (1,2)

xerxes · May 2, 2013

Yes, you're right, it doesn't work.

Reading through that document I linked to earlier, it also doesn't work with direct page navigation either; this sort of thing:

< First Page < Previous Page 1 2 3 4 5 6 7 8 9 10 Next Page > Last Page >

To some extent, some of this might be academic when we're talking about tables with a few hundred, or even a few thousand records, rather than tens or hundreds of thousands.

laserlight · May 2, 2013

Derokorian wrote:
Nope: because you can have IDs that are higher than the last ID but the price may still be lower (or higher), meaning you're now filtering out results, because they have an ID that's not sequentially accurate to the price.

Indeed. One solution that comes to mind is to handle those that are of the same price in one query (in which case filtering by ID alone is fine), then the rest in another query (in which case filtering by price alone is fine), and UNION the two queries.

xerxes wrote:
it also doesn't work with direct page navigation either

Yes, that's why the examples I gave in post #2 were previous/next links and scrolling for a mobile app (though in retrospect, you could do it for "load on demand" as you scroll down the page on a normal browser too).

Tea_J · May 2, 2013

good to see other guys getting into the topic..

but i must say i found my way to be so convenient .. and so far updated one of my project (an inventory system) as it has lots of complicated select queries, and they all seem to work well. EXCEPT ONLY for queries w/ group by clauses.. for that i did not work on solution yet but a simple call to the SQL_CALC_ROWS_FOUNd would do the trick for those rare occasions.

to repeat my function is simple..

1) Throw any SELECT statement at it and it will generate a COUNT statement from it (so long as it doesnt have the GROUP BY clause.. there are probably other clauses that may cause issues, ill find out when it happens lolz)
2) the function will split the statement at the first " FROM " keyword . no more regex, or fiddling w/ dynamic constructs etc..

  //if rowlimit is set is greated than 0 do limitter. 
        if( $rowlimit > 0 ){

        if( $t_page > 0 ){ //if curpage or targetpage is set then attempt to paginate

           //make sure statement doesnt have Group By clause 
            if(!stristr($q, "GROUP BY")){// if doesn't have GROUP BY clause, do paging logic
                //'/FROM.*(?=GROUP BY|ORDER BY|LIMIT|$)/sU' regex for later use if needed

               $q_parts = explode( " FROM ", strtoupper($q) , 2); //limit to 2 only 
                $q_count = "SELECT count(*) AS row_count FROM " . $q_parts[1];
                $col_r = query_select($q_count); // query select is one 
                $row_count =  $col_r['row_count'];

                //calculate pages 
                $page_count = ceil($row_count / $rowlimit); //eg. 100 / 10  = 10 pages

                if($t_page <= $page_count){
                    $startrow =  ($t_page * $rowlimit) - $rowlimit; //coz the ***** starts at zero
                }else{
                    $startrow = $page_count;
                }

            }else{ //GROUP BY statement requires counting off a subquery.
              etc etc etc...

note: $col_r = query_select($q_count); <-- this is just my function that makes quick single row select for me

3) after that the function will return 2 variables, page (current page its on) and pagecount (total pages)

4) the rest of the script simply generates next, prev, and the rest of the page 1 2 3 etc ..

that's it, it's so simple. the key really is just getting the COUNT thing to run. ofcourse i have obviously ignored the Group By stuff for now but one doesnt really have the choice when it comes to that.. CouNT will not work at all hehe.
the key is in all in this little snippet

$q_parts = explode( " FROM ", strtoupper($q) , 2); //limit to 2 only
$q_count = "SELECT count(*) AS row_count FROM " . $q_parts[1];
$col_r = query_select($q_count); // query select is one

hehehe.

PS
as i am for simplicity always, i would rather do the LIMIT START,ROWLIMIT thing than mess w/ primary ID's and bother w/ sorting.. and the fact that each query may have different primary ID names to target.. it's just too impractically messy..

Weedpacket · May 2, 2013

Tea_J wrote:
i have obviously ignored the Group By stuff for now but one doesnt really have the choice when it comes to that.. CouNT will not work at all hehe.

You could replace the grouped columns (or expressions) with selecting distinct columns/expressions - but that might not be simple.

it's just too impractically messy..

It might be difficult getting a full solution (assuming you don't find someone else who's already done it better than you could), but you wouldn't have to do it again.

PHP Paginating Logic: SQL result vs LIMIT clause , does it really make a difference?

TTea_J

laserlight

dalecosp

Bbradgrafelman

Weedpacket

TTea_J

laserlight

TTea_J

TTea_J

Xxerxes

laserlight

Xxerxes

laserlight

Xxerxes

Xxerxes

DDerokorian

Xxerxes

laserlight

TTea_J

Weedpacket