Hello, I am having some performance issues with a certain Postgres database query joining 2 tables (containing 1 record each) and 1 view (containing 3 million records). Here are the table definitions, simplified as much as possible:
table1 (contains 1 record):
record_id SERIAL
table2_id INT4 (foreign key, can be null)
view1_id INT4 (foreign key, can be null)
table2 (contains 1 record):
record_id SERIAL
view1 (contains 3 million records, joins 3 other tables):
record_id SERIAL
column2 TEXT NOT NULL
Note that table1.view1_id is actually a foreign key to a table containing 3 million records, which is reconstructed by view1. Indexes have been created on all columns shown above, and VACUUM ANALYZE has been run. The tables are normalized. Here is the query (simplified):
SELECT *
FROM (table1 LEFT JOIN table2 ON table1.table2_id = table2.record_id)
LEFT JOIN view1 ON table1.view1_id = view1.record_id
WHERE view1.column2 = 'TEST';
As shown above, the query takes well over 1 minute to execute. If I substitute "INNER JOIN" for the second "LEFT JOIN" (the one involving table3), performance improves dramatically and the query executes in about 1 second.
Here is my EXPLAIN output. As you can see I have a table of cities, a table of countries, and a table of provinces. The table containing cities is the one with 3 million records.
EXPLAIN output for LEFT JOIN version of query (slow)
Nested Loop (cost=182.85..576063.89 rows=220 width=629)
-> Hash Join (cost=1.02..2.09 rows=3 width=377)
-> Seq Scan on table1 t1 (cost=0.00..1.03 rows=3 width=276)
-> Hash (cost=1.02..1.02 rows=2 width=101)
-> Seq Scan on table2 t2 (cost=0.00..1.02 rows=2
width=101)
-> Subquery Scan v1 (cost=181.83..148003.47 rows=2938516
width=105)
-> Hash Join (cost=181.83..148003.47 rows=2938516 width=105)
-> Seq Scan on table_city t3 (cost=0.00..59666.16
rows=2938516 width=47)
-> Hash (cost=170.93..170.93 rows=4360 width=58)
-> Hash Join (cost=6.12..170.93 rows=4360
width=58)
-> Seq Scan on table_province t2
(cost=0.00..77.60 rows=4360 width=27)
-> Hash (cost=5.50..5.50 rows=250
width=31)
-> Seq Scan on table_country t1
(cost=0.00..5.50 rows=250 width=31)
EXPLAIN output for INNER JOIN version of query (fast)
Hash Join (cost=477.17..488.27 rows=1 width=482)
-> Merge Join (cost=475.07..486.12 rows=10 width=105)
-> Sort (cost=434.49..434.49 rows=4360 width=58)
-> Hash Join (cost=6.12..170.93 rows=4360 width=58)
-> Seq Scan on table_province t2
(cost=0.00..77.60 rows=4360 width=27)
-> Hash (cost=5.50..5.50 rows=250 width=31)
-> Seq Scan on table_country t1
(cost=0.00..5.50 rows=250 width=31)
-> Sort (cost=40.58..40.58 rows=10 width=47)
-> Index Scan using table_city_city_name_key on
trek_city t3 (cost=0.00..40.43 rows=10 width=47)
-> Hash (cost=2.09..2.09 rows=3 width=377)
-> Hash Join (cost=1.02..2.09 rows=3 width=377)
-> Seq Scan on table1 t1 (cost=0.00..1.03 rows=3
width=276)
-> Hash (cost=1.02..1.02 rows=2 width=101)
-> Seq Scan on table2 t2 (cost=0.00..1.02 rows=2
width=101)
Could anyone offer any suggestions on how to improve the performance of the LEFT JOIN? I can't figure out why it won't use the index like the INNER JOIN does. Thank you for your time.