The thing you need to think about with SQL is that while you think combining 3 queries into one will make it faster, it may not. Every "join" or "union" creates a temporary table. So MySQL could really eat up some memory if you don't optimize your queries, and also the tables could get quite large.
Some things to help speed up your queries are to:
- Select only the fields you need to work with
[indent]Don't select all the fields of a table if you're going to work with only a small subset. It's just like eating, try not to take more than you're going to need.[/indent]
- Use defined joins, don't use Cartesian joins
[indent]Using two tables in the "FROM" clause like: "SELECT * FROM tabeA, tableB" creates a Cartesian Join where every row is included in the result set. Using a defined join like INNER JOIN or LEFT JOIN will allow you to specify which column(s) to match up, and those that don't match up will be left out (with some caveats).[/indent]
- Keep the query simple
[indent]If you have a query with a sub-query, with sub-query with a huge CASE statement, that can slow things down. Each sub-query is a temporary table, and CASE statements can slow down processing of large tables because MySQL has to evaluate each record. Keep them simple and to the point. Don't try to do too much with them.[/indent]
So let's go through your code here.
-
SELECT COUNT(*) FROM irs_rawdata
[indent]So you want the total number of rows in the table "irs_rawdata". Okay, this is a quick simple way to do it. Nothing wrong here.[/indent]
-
Loop through from 0 to the max rows executing this: SELECT EIN FROM irs_rawdata LIMIT 0,1
[indent]Well, here is some trouble. While it's a short simple query, it's executing thousands of times, maybe more. This can really cause a slow-down in execution since there are so many records, and so many queries to the MySQL server. You're asking for overload.[/indent]
-
With each row from the query in #2, you execute this: SELECT fin FROM block WHERE fin = '$fin'
[indent]Okay, so you want the "fin" from the block table. This is a simple query which does this, but at what cost? If we have thousands of rows, once again, we're running this exact query thousands of times. Could make things very very slow.[/indent]
-
If a row is not found in the "block" table from query in #3, then you want to INSERT INTO the "query" table information from the irs_rawdata table.
[indent]This is a hefty query, and could cause some backup, but not as much as the others. Once again, this is run a lot. Maybe not 100% of the times, but still, it has the possibility for each loop to be run.[/indent]
-
If you insert successfully in #4, then you delete from irs_rawdata
[indent]Nothing wrong here. No real comments 🙂[/indent]
-
If you don't insert successfully in #4, then you want to add the current row info in irs_rawdata to the "block" table.
[indent]Nothing wrong here. It's just run a lot, that's all.[/indent]
-
If the insert in #6 works, then you delete from irs_rawdata
[indent]This is fine, nothing major here.[/indent]
-
If there is a record found in the block table from the query in #3, then delete that row from irs_rawdata
[indent]Okay, simple enough.[/indent]
Now, if we take this code and refactor it, I think we could actually trim it down to a few steps:
-
Find those rows from irs_rawdata that are currently in "block" table
-
Delete those rows from irs_rawdata
-
Take remaining rows in irs_rawdata and see if they match a specific set of criteria. If so, insert those items into the "block" table.
-
Remove those inserted into "block" from "irs_rawdata"
Notice how your 8 steps which are replicated thousands of times are trimmed down to 4? Now using this setup, we can "batch process" items. Here's some sample code:
<?php
// 1.) Find rows in irs_rawdata that are currently in "block" table:
$query1 = "SELECT ir.`EIN`
FROM `irs_rawdata` ir
INNER JOIN `block` b
ON ir.`EIN` = b.`fin`";
$in_block = @mysql_query($query1) or die('#1.) Error finding rows in `block`: ' . mysql_error());
$eins = array();
while($row = mysql_fetch_row($in_block))
{
$eins[] = $row['EIN'];
}
// 2.) Delete rows returned from #1:
$query2 = "DELETE FROM `irs_rawdata` WHERE EIN IN(" . implode(', ', $eins) . ")";
$delete = @mysql_query($query2) or die('#2.) Error deleting rows in irs_rawdata: ' . mysql_error());
// 3.) Find those rows which match our criteria:
$query3 = "SELECT EIN
FROM `irs_rawdata`
WHERE
PNO LIKE '%blind%' OR
PNO LIKE '%deaf%' OR
PNO LIKE '%braille%' OR
PNO LIKE '%sight%' OR
PNO LIKE '%vision%' OR
PNO LIKE '%sign%' OR
PNO LIKE '%ear%' OR
PNO LIKE '%eye%' OR
PNO LIKE '%dog%' OR
Activity_Code LIKE '%031%' OR
NTEE_Code LIKE '%G41%' OR
NTEE_Code LIKE '%G42%' OR
NTEE_Code LIKE '%H41%' OR
NTEE_Code LIKE '%H42%' OR
NTEE_Code LIKE '%P86%' OR
NTEE_Code LIKE '%P87%'";
$need_insert = @mysql_query($query3) or die('#3.) Error finding rows which match criteria: ' . mysql_error());
$eins = array();
$query3a = "INSERT INTO `block` (`find`, `who_blocked`)
VALUES";
while($row = mysql_fetch_row($need_insert))
{
$eins[] = $row['EIN'];
$query3a .= "
('" . $row['EIN'] . "', 'S-123456789'),";
}
$query3a = substr($query3a, 0, -1); // Remove the final "," in the SQL string
$inserted = @mysql_query($query3a) or die('#3.) Error inserting data into block table: ' . mysql_error());
// 4.) Delete those inserted rows from the irs_rawdata table:
$query4 = "DELETE FROM `irs_rawdata` WHERE EIN IN(" . implode(', ', $eins) . ")";
$deleted = @mysql_query($query4) or die('#4.) Error deleting rows from irs_rawdata: ' . mysql_error());
That's a basic idea of what I would do. It is really short and to the point. You can add your error print out display and what not as you need, but this is just a skeleton of what you probably need.
Do note that what's left in the irs_rawdata table will be those rows that didn't match criteria, and that weren't already in the block table. Everything else should be deleted.
Hope that helps explain a few things.