Postgresql testing
Ok, I created a test table and populated it with this script:
#!/usr/local/bin/php -q
<?php
if ($argc!=2) die("\n\nUsage: insert_rand_text_date <tablename>\n\n");
$table = $argv[1];
$conn = pg_connect("dbname=mydb user=me");
$filename = "/usr/share/dict/words";
$fp=fopen($filename,"r");
while (!feof($fp)){
$lines[] = addslashes(chop(fgets($fp,4096)));
}
$count = count($lines);
$max = 2700000;
function make_seed() {
list($usec, $sec) = explode(' ', microtime());
return (float) $sec + ((float) $usec * 100000);
}
mt_srand(make_seed());
pg_exec($conn,"truncate $table");
pg_exec($conn,"begin");
$jan12004 = mktime(0,0,0,1,1,2004);
for ($i=0;$i<$max;$i++){
if ($i%1000==0&&$i!=0){
print $i."\n";
pg_exec($conn,"commit");
pg_exec($conn,"begin");
usleep(10);
}
$query = "insert into $table (info,dt) values ";
$query.= "('";
$query.= $lines[mt_rand(0,$count-2)]." ";
$query.= $lines[mt_rand(0,$count-2)]." ";
$query.= $lines[mt_rand(0,$count-2)]." ";
$query.= $lines[mt_rand(0,$count-2)]."',";
$rnum = mt_rand(0,31535999);
$date = date("Y-m-d G:i:s",$rnum+$jan12004);
# print $date;
# exit;
$query.= "'".$date."'";
$query.= ")";
# print $query."\n";
# exit;
@pg_exec($conn,$query);
if (pg_last_error($conn)) {
print "\n";
print pg_last_error($conn);
print "\n";
print $query;
print "\n";
exit;
}
}
pg_exec($conn,"commit");
pg_exec($conn,"vacuum full analyze $table");
?>
which put 2.7 million rows of about 50 bytes width into the database.
On my database, the default setting for random_page_cost = 1.4
I ran a couple of queries:
For one random day:
explain analyze select * from test where dt between '2004-07-01 00:00:00' and '2004-07-01 23:59:59';
returned 7365 rows with an index scan, took 2 seconds.
For two days:
explain analyze select * from test where dt between '2004-07-01 00:00:00' and '2004-07-02 23:59:59';
returned 14937 rows with an index scan and took 3.5 seconds.
Five days:
explain analyze select * from test where dt between '2004-07-01 00:00:00' and '2004-07-05 23:59:59';
returned 37044 rows with an index scan in 10.5 seconds.
10 days:
explain analyze select * from test where dt between '2004-07-01 00:00:00' and '2004-07-10 23:59:59';
returned 74513 rows with a seq scan in 17.6 seconds.
At this point, to see if my random_page_cost was a good choice, I forced the planner to use an index scan by setting random_page_cost to 0.0, the response time was 18 seconds.
NOW, I clustered on the dt field:
I'll post the results when the database finishes clustering. This could take a while.