PHP Paginating Logic: SQL result vs LIMIT clause , does it really make a difference?

laserlight · May 1, 2013

Tea_J wrote:
borrowing bradgrafeliman's words and tweaking it:

SUB QUERY
QUERY # 1: "count the rows from the result of the following operation: ( Query # 2) Retrieve all 1.3 million rows from the result set into my memory space, then discard all the results"
QUERY # 3: "Retrieve again the rows needed but this time return 100 rows only if available)"

You may have borrowed the words, but you failed to adjust the context.

The results retrieved from Query #2 could be very different from what you have in mind, i.e., the database engine could be smart enough to only retrieve primary keys, or perform some other optimisation. In your proposed scenario, you would be retrieving the data that you want to use as well, which could be substantially more.
Unless you happen to have the webserver and the database server on the same machine, the retrieval of the rows will not be into your memory space. Consequently, only the final result, i.e., the row count and the rows that you actually want, would be sent across the network and into your memory space.

Tea_J wrote:
So this is still an extra load on the server and I dont know if it is half as bad as retrieving all 1.3million records in the first place .. the SQL_CALC_FOUND_ROWS also has been known to slow down select queries.

Obviously it will slow down select queries: it has to do more work. If you don't know but need to know in order to choose, then test and find out.

Tea_J wrote:
dynamically modify the SELECT statement to use COUNT(*) instead of doing a full select, using regex..

Wouldn't it make more sense to dynamically construct the SQL statement?

Tea_J · May 1, 2013

i see your points very well on both accounts. thanks

Wouldn't it make more sense to dynamically construct the SQL statement?

newp, not for my case.. you see i have a function for all my select queries.. and this integrates with a lot of my other stuffs just fine.

query_select($q)

where $q is the SQL statement..

the $q is the SELECT query ... and a lot of times this is dynamically constructed.. after the statement is prepared, i basically pass the complete select query to the function.

now instead of me having to code the pagination logic and construct an SQL for each select routines i have, i want to simply use the same query_select() that i already have , pass whatever SELECT statement at it, and it will be smart enough to do pagination for me.

query_select($q,$rowlimit,$curpage)

so, all i need now is a REGEXP for my query_select function to extract the needed conditions to do a simple Count..

query select function will first check for keywords "group by" Distinct and all that.. if they dont exist then REGEXP will fire and extract the needed conditions and build the count query. do full pagination.
if the keywords do exist, function will then offer just a simple PREV NEXT

..later ill probably add a function to force it to do SQL_CALC_FOUND_ROWS , if i really need pagination.

Tea_J · May 1, 2013

(deleted, wrong thread)

xerxes · May 1, 2013

I was having similar thoughts recently and my solution was this:

I first store my FROM clause in a variable and construct my WHERE clause and store that in another variable.

Then I calculate the total rows by selecting just the ID field using the FROM and WHERE clauses:

$intRecFirst     = isset($_GET["FirstRecord"])	? funcStripText($_GET["FirstRecord"])	              : 0;     //First Row
$intRecTotal    = isset($_GET["TotalRecords"])	? funcStripText($_GET["TotalRecords"])	: 0;     //Total Records
$intRecPage    = 20;   // The mumber of records per page

.....

if ($intRecTotal == 0){
	$pdo = $connDB->prepare("SELECT ID FROM" . $strSQLFROM . $strSQLWHERE);   // SELECT just the ID field
	$pdo->execute();
	$intRecTotal = $pdo->rowCount();
	$pdo->closeCursor();
}

I tried SELECT COUNT(ID), but found that using rowCount() was marginally faster.

Also, I'm only doing this when the page is first loaded, or if the filter/search parameters change, thereafter I pass $intRecTotal along with the URL paramaters, for example:

echo '<a href="Page.php?FirstRecord=' . $intRecFirst . '&TotalRecords=' . $intRecTotal . '">Next Page</a>'

Then I'm re-using the FROM and WHERE clauses in the SELECT statement that retrieves the data I actually want to display on the page:

$pdo = $connDB->prepare("SELECT ID, Something, SomethingElse FROM " . $strSQLFROM . $strSQLWHERE . " LIMIT " . $intRecFirst . "," . $intRecPage . ";");
$pdo->execute();
$arrView = $pdo->fetchAll();
$pdo->closeCursor();
$connDB = null;

If there are a really large number of records, you could perhaps extend this to put a limit on first SELECT statement as well, because if we have 20 records per page we don't really expect someone to page through more than 20, 30 pages? Instead you'd expect them to use the search/filter to further narrow the results before paging through them, something like this:

if ($intRecTotal == 0){
	$pdo = $connDB->prepare("SELECT ID FROM" . $strSQLFROM . $strSQLWHERE . " LIMIT 0, 1000");   // SELECT just the ID field
	$pdo->execute();
	$intRecTotal = $pdo->rowCount();
	$pdo->closeCursor();
}

Then on the page instead of displaying something like "Records 1 to 20 of 123,456", we could show something like this: "Records 1 to 20 of the first 1,000".

What do you think?

I also came across this: http://www.percona.com/files/presentations/ppc2009/PPC2009_mysql_pagination.pdf, which talks about not using offset with LIMIT, but instead using the last ID to append to the WHERE clause. However, I can't see how you would use this with records that can be re-ordered so that IDs are no longer sequential.

laserlight · May 1, 2013

xerxes wrote:
I first store my FROM clause in a variable and construct my WHERE clause and store that in another variable.

Just to check: you are making use of parameter binding in a prepared statement rather than just building that those clauses with the data embedded, right?

xerxes wrote:
I tried SELECT COUNT(ID), but found that using rowCount() was marginally faster.

How many rows are there and is the database on a separate server?

xerxes wrote:
If there are a really large number of records, you could perhaps extend this to put a limit on first SELECT statement as well, because if we have 20 records per page we don't really expect someone to page through more than 20, 30 pages? Instead you'd expect them to use the search/filter to further narrow the results before paging through them (...) Then on the page instead of displaying something like "Records 1 to 20 of 123,456", we could show something like this: "Records 1 to 20 of the first 1,000".

Fairly sound reasoning. Weedpacket noted in post #5 that Google does something similiar.

xerxes wrote:
I also came across this: http://www.percona.com/files/present...pagination.pdf, which talks about not using offset with LIMIT, but instead using the last ID to append to the WHERE clause. However, I can't see how you would use this with records that can be re-ordered so that IDs are no longer sequential.

Yes, this is the general idea of the alternative that I mentioned in post #2. As for re-ordering: how are you doing the re-ordering?

xerxes · May 1, 2013

Just to check: you are making use of parameter binding in a prepared statement rather than just building that those clauses with the data embedded, right?

Not in this instance, although I have on pages where I have INSERTS and UPDATES; I might have to look at that. I have run user input through a function to strip out a lot of potential nasties:

function funcStripTextEsc($strText){
	if (is_numeric($strText)==false){
		$arrBadWords = array("tbl_", ";", "--", "({", "/*", "xp_", "dt_", "varchar");
		$strText = str_ireplace($arrBadWords, "", $strText);

	$arrEscape = array("\0", "\b", "\n", "\r", "\t", "\\", "\%", "\_", "\Z", "'", '"');
$arrReplace = array("\\0", "\\b", "\\n", "\\r", "\\t", "\\\\", "\\%", "\\_", "\\x1a", "\'", '\"');
	$strText = str_replace($arrEscape, $arrReplace, $strText);

	$strText = htmlspecialchars($strText);
}
return $strText;
}

Note that some of these are SQL Server specific. Also, I precede all my database table names with tbl_, so that I can strip that out and prevent people doing stuff like:
DROP tbl_Users FROM... should they guess or know my table names. In addition, much of the search input is pre-defined via drop-downs and radio buttons etc.: http://host24.qnop.net/~drdev/AdBrowse.php

How many rows are there and is the database on a separate server?

Not many in this case, this a "My first PHP site" type scenario, where I'm learning PHP/MySQL having been working with ASP and SQL Server for some time.

Yes, this is the general idea of the alternative that I mentioned in post #2. As for re-ordering: how are you doing the re-ordering?

I'm allowing the user to re-order results via a drop-down, something that you might want to do often on something like an e-commerce site where you let customers order results by price, manufacturer etc.

laserlight · May 1, 2013

xerxes wrote:
I have run user input through a function to strip out a lot of potential nasties:

The problem with this approach as a general approach is that you can end up mangling valid input (and I notice that you end up storing the data with the assumption that it is HTML, since you use htmlspecialchars). Furthermore, blacklisting is always weaker than whitelisting: if you miss a "nasty", you could be in trouble. Hence, binding parameters with prepared statements provides a stronger margin of security while leaving the input intact by separating the data from the SQL statement.

xerxes wrote:
In addition, much of the search input is pre-defined via drop-downs and radio buttons etc.

This is illusory since the user can change that, e.g., by using tools like Firebug. You still must check on the server side.

xerxes wrote:
Not many in this case, this a "My first PHP site" type scenario, where I'm learning PHP/MySQL having been working with ASP and SQL Server for some time.

That could explain why you found rowCount to be marginally more efficient than a SELECT COUNT.

xerxes wrote:
I'm allowing the user to re-order results via a drop-down, something that you might want to do often on something like an e-commerce site where you let customers order results by price, manufacturer etc.

If you are allowing the re-ordering by price or something like that, then that field can then be part of the WHERE clause. In fact, I believe that the article that you read mentions a thumbs_up field that plays a part.

xerxes · May 1, 2013

laserlight;11027491 wrote:
The problem with this approach as a general approach is that you can end up mangling valid input (and I notice that you end up storing the data with the assumption that it is HTML, since you use htmlspecialchars). Furthermore, blacklisting is always weaker than whitelisting: if you miss a "nasty", you could be in trouble. Hence, binding parameters with prepared statements provides a stronger margin of security while leaving the input intact by separating the data from the SQL statement.

Hmmm, OK, perhaps I'll look into using a stored procedure for searches, I can then pass all the parameters to the stored procedure via prepared statements, then build the WHERE statement within the stored procedure, I've done this with SQL Server and it looks pretty similar.

This is illusory since the user can change that, e.g., by using tools like Firebug. You still must check on the server side.

To be honest, I didn't think that through, I'm using GET, so they'd have the opportunity to mess with the URL parameter anyway. :rolleyes:

That could explain why you found rowCount to be marginally more efficient than a SELECT COUNT.

OK, I'll have to insert a load of extra records and check that out.

If you are allowing the re-ordering by price or something like that, then that field can then be part of the WHERE clause. In fact, I believe that the article that you read mentions a thumbs_up field that plays a part.

Gotcha, so we use whatever we are ordering by within the WHERE clause, for example:

"WHERE Price > " . $lastPrice . " ORDER BY PRICE ASC LIMIT 10"

"WHERE Price < " . $lastPrice . " ORDER BY PRICE DESC LIMIT 10"

"WHERE ProductName > " . $lastName . " ORDER BY ProductName ASC LIMIT 10"

etc.

However, I can see a problem with this; it will fall to pieces if things aren't unique. Let's say we have a lot of products, it's highly likely that we will have many different products at the same price. Now if we have 30 products at £10.00, you'll only ever see the first 10, because rather than show the next 20, the WHERE clause: "WHERE Price > 10.00 ORDER BY PRICE ASC LIMIT 10" will cause the SELECT statement to skip over them to a price that is greater than £10.00. :queasy:

xerxes · May 1, 2013

Hold on, I've just re-thought that last part, if I always include the ID as part of my ORDER I can get around it:

"WHERE ID > " . $intLastID . " ORDER BY Price DESC, ID ASC LIMIT 10"

I really should read the stuff I link to properly. :o

Derokorian · May 1, 2013

Nope: because you can have IDs that are higher than the last ID but the price may still be lower (or higher), meaning you're now filtering out results, because they have an ID that's not sequentially accurate to the price. Example:

ID - Price
1 - 2.50
2 - 3.50
3 - 1.00
4 - 0.99
5 - 1.50

SELECT FROM <table> ORDER BY Price ASC LIMIT 2, gives IDs (4, 3), so page 2 would be SELECT FROM <table> WHERE ID > 3 ORDER BY Price ASC LIMIT 2 which gives IDs (4,5), you've now lost IDs (1,2)

xerxes · May 2, 2013

Yes, you're right, it doesn't work.

Reading through that document I linked to earlier, it also doesn't work with direct page navigation either; this sort of thing:

< First Page < Previous Page 1 2 3 4 5 6 7 8 9 10 Next Page > Last Page >

To some extent, some of this might be academic when we're talking about tables with a few hundred, or even a few thousand records, rather than tens or hundreds of thousands.

laserlight · May 2, 2013

Derokorian wrote:
Nope: because you can have IDs that are higher than the last ID but the price may still be lower (or higher), meaning you're now filtering out results, because they have an ID that's not sequentially accurate to the price.

Indeed. One solution that comes to mind is to handle those that are of the same price in one query (in which case filtering by ID alone is fine), then the rest in another query (in which case filtering by price alone is fine), and UNION the two queries.

xerxes wrote:
it also doesn't work with direct page navigation either

Yes, that's why the examples I gave in post #2 were previous/next links and scrolling for a mobile app (though in retrospect, you could do it for "load on demand" as you scroll down the page on a normal browser too).

Tea_J · May 2, 2013

good to see other guys getting into the topic..

but i must say i found my way to be so convenient .. and so far updated one of my project (an inventory system) as it has lots of complicated select queries, and they all seem to work well. EXCEPT ONLY for queries w/ group by clauses.. for that i did not work on solution yet but a simple call to the SQL_CALC_ROWS_FOUNd would do the trick for those rare occasions.

to repeat my function is simple..

1) Throw any SELECT statement at it and it will generate a COUNT statement from it (so long as it doesnt have the GROUP BY clause.. there are probably other clauses that may cause issues, ill find out when it happens lolz)
2) the function will split the statement at the first " FROM " keyword . no more regex, or fiddling w/ dynamic constructs etc..

  //if rowlimit is set is greated than 0 do limitter. 
        if( $rowlimit > 0 ){

        if( $t_page > 0 ){ //if curpage or targetpage is set then attempt to paginate

           //make sure statement doesnt have Group By clause 
            if(!stristr($q, "GROUP BY")){// if doesn't have GROUP BY clause, do paging logic
                //'/FROM.*(?=GROUP BY|ORDER BY|LIMIT|$)/sU' regex for later use if needed

               $q_parts = explode( " FROM ", strtoupper($q) , 2); //limit to 2 only 
                $q_count = "SELECT count(*) AS row_count FROM " . $q_parts[1];
                $col_r = query_select($q_count); // query select is one 
                $row_count =  $col_r['row_count'];

                //calculate pages 
                $page_count = ceil($row_count / $rowlimit); //eg. 100 / 10  = 10 pages

                if($t_page <= $page_count){
                    $startrow =  ($t_page * $rowlimit) - $rowlimit; //coz the ***** starts at zero
                }else{
                    $startrow = $page_count;
                }

            }else{ //GROUP BY statement requires counting off a subquery.
              etc etc etc...

note: $col_r = query_select($q_count); <-- this is just my function that makes quick single row select for me

3) after that the function will return 2 variables, page (current page its on) and pagecount (total pages)

4) the rest of the script simply generates next, prev, and the rest of the page 1 2 3 etc ..

that's it, it's so simple. the key really is just getting the COUNT thing to run. ofcourse i have obviously ignored the Group By stuff for now but one doesnt really have the choice when it comes to that.. CouNT will not work at all hehe.
the key is in all in this little snippet

$q_parts = explode( " FROM ", strtoupper($q) , 2); //limit to 2 only
$q_count = "SELECT count(*) AS row_count FROM " . $q_parts[1];
$col_r = query_select($q_count); // query select is one

hehehe.

PS
as i am for simplicity always, i would rather do the LIMIT START,ROWLIMIT thing than mess w/ primary ID's and bother w/ sorting.. and the fact that each query may have different primary ID names to target.. it's just too impractically messy..

Weedpacket · May 2, 2013

Tea_J wrote:
i have obviously ignored the Group By stuff for now but one doesnt really have the choice when it comes to that.. CouNT will not work at all hehe.

You could replace the grouped columns (or expressions) with selecting distinct columns/expressions - but that might not be simple.

it's just too impractically messy..

It might be difficult getting a full solution (assuming you don't find someone else who's already done it better than you could), but you wouldn't have to do it again.

Tea_J · May 2, 2013

Weedpacket;11027539 wrote:
You could replace the grouped columns (or expressions) with selecting distinct columns/expressions - but that might not be simple.

yeh.. ill fight that logic another day.. lolz. for now i've set my routine to offer NEXT/PREV options only (one of Laserlight's suggestion) w/o pagination should it has GROUP BY clause lolz

It might be difficult getting a full solution (assuming you don't find someone else who's already done it better than you could), but you wouldn't have to do it again.

true, but at this point issue has been solved in much simplicity , so why bother w/ other ways to do it.. unless there's a real need w/ significance, then i'd explore that option.

Weedpacket · May 2, 2013

Tea_J wrote:
unless there's a real need w/ significance, then i'd explore that option.

Well, there was, but you decided it was too hard

xerxes · May 2, 2013

it's just too impractically messy..

It might be difficult getting a full solution (assuming you don't find someone else who's already done it better than you could), but you wouldn't have to do it again.

To add to what Weedpacket said, you might spend several hours coming up with a really good solution for a particular issue, then you can perhaps turn it into a function, or possibly just a code snippet where it can be quickly modified by assigning values to a few key variables. If it's for a very common activity, like recordset paging, you might re-use that function or chunk of code many times for dozens different web sites.

When I think back to all the ASP sites I've built, apart for the occasional site for something a little out of the ordinary, probably only 10% of each one is truly original, the rest of the site is re-used code or essentially an adaption of a similar previous site with a different interface and a different page layout etc., so it's well worth investing the time to get something right if you might re-use often.

Tea_J · May 3, 2013

Weedpacket;11027545 wrote:
Well, there was, but you decided it was too hard

lolz.. but .. but.....

To add to what Weedpacket said, you might spend several hours coming up with a really good solution for a particular issue, then you can perhaps turn it into a function, or possibly just a code snippet where it can be quickly modified by assigning values to a few key variables. If it's for a very common activity, like recordset paging, you might re-use that function or chunk of code many times for dozens different web sites.

When I think back to all the ASP sites I've built, apart for the occasional site for something a little out of the ordinary, probably only 10% of each one is truly original, the rest of the site is re-used code or essentially an adaption of a similar previous site with a different interface and a different page layout etc., so it's well worth investing the time to get something right if you might re-use often.

true, hence i started this topic.. since pagination engine i've been doing was manually coded per need.. this time i decided to code a universal paging solution (instead of using paging frameworks) .. i started this thread hoping someone would say "it's ok to fetch all rows and calculate paging from that as it doesnt put a dent in server performance unless you actually fetch each rows ( mysql_fetch_rows)" .. but as expected that is bad way, hence im sticking w/ the harder, but more efficient way of doing things = LIMIT..

PS
I wish the next MySQL update would have a function for FOUND ROWS w/o having to issue the CALC FOUND ROWS clause.. that would really simplify things so much more..

xerxes · May 3, 2013

PS
I wish the next MySQL update would have a function for FOUND ROWS w/o having to issue the CALC FOUND ROWS clause.. that would really simplify things so much more..

Indeed, there are a number of differences between PHP, mysqli/PDO, MySQL and ASP,ADO,Microsoft SQL Server, with which I'm more familiar, that have me wondering whether I'm doing the things the right/best way.

For example in ASP/ADO/ MS SQL if you wanted to retrieve data to use in a drop-down, or you wanted a limited number of records for page, similar to what you'd get with LIMIT, you would use GETROWS(): http://www.adopenstatic.com/experiments/recordsetpaging.asp which allows you to fetch a whole result set into a 2 dimensional array. This was generally considered more efficient than the more common way of returning a recordset then looping through the results outputting the values at each step of the loop because there are a lot less calls to the DB and you close the connection earlier:

Looping through recordset:

Open connection to DB.

SELECT Statement

WHILE NOT End of recordset
Output row result
WEND

Close connection

Using GetRows():

Open connection to DB.

SELECT statement

Fetch all the results into array

Close connection

Loop through array

I've tried to mimic this with PHP, PDO, MySQL using fetchall: http://php.net/manual/en/pdostatement.fetchall.php and that part is probably OK, because I'm only using it to fetch tens of records, not hundreds or thousands. Where it falls down is that using LIMIT with a start position incurs a cumulative performance hit, whereas ADO GETROWS() used in conjunction with Move() doesn't. I can find no equivalent to Move(), which is a shame, because if we could drop the start position from LIMIT and use it in conjunction with Move() we could forget all this nonsense about constructing WHERE clauses to determine where we left off and keep things like direct page navigation.

Unless of course anyone know different.

PHP Paginating Logic: SQL result vs LIMIT clause , does it really make a difference?

laserlight

TTea_J

TTea_J

Xxerxes

laserlight

Xxerxes

laserlight

Xxerxes

Xxerxes

DDerokorian

Xxerxes

laserlight

TTea_J

Weedpacket

TTea_J

Weedpacket

Xxerxes

TTea_J

Xxerxes