over-structures for SQL query creation

sfullman · Jan 11, 2009

Hi All,

I'm fairly competent at creating SQL queries but am working on some higher-level structuring of my queries so that they can be "built" by an algorithm based on a standard. I hope I can express this in a way that will elicit some links to tutorials or articles that would be assisting.

MOST queries involve some sort of root object stored in a table. Invoices, for example. One table may have the client name in the invoice, but obviously a better table would have the client name normalized out to another table.

furthermore, there may be other fields which I wish to add by joining other tables. Most of those involve something like this

SELECT
{root fields},
tableN.LabelField (like ClientName, etc.)
FROM rootTable, tableN
WHERE /*add this constraint: */ rootTable.ForeignkeyID = tableN.ID

However I've evolved to structuring that query like this:

SELECT
{root fields},
tableN.LabelField
FROM rootTable
LEFT JOIN tableN ON rootTable.ForeignkeyID = tableN.ID

Why you may ask?
1. Because this method seems functionally equivalent but I don't know if it's slower
2. This method only adds length to the FROM clause and doesn't encumber the WHERE clause
3. In the case that the rootTable doesn't have a dependent record in the child table, this method will produce a NULL for the field for that record, vs. making the record disappear. I figure if I have to babysit the disappearance of childless records that should have a child, my code wasn't written well to begin with, so why use a query to enforce it.

What i am visualizing a query build as is the following:
1. the root table is always identifyable (if not my convention here won't work - but most objects will)
2. each LEFT joined table is a branch
3. each branch may have more than one table to "get to" the data i need as:

SELECT
{rootFields},
{fields in table c}
FROM rootTable
LEFT JOIN table1 ON rootTable.ForeignkeyID=table1.ID
LEFT JOIN table2 ON table1.ForeignkeyID=table2.ID

therefore, table1 can be aliased reliably as r1b1t1 (root table 1, branch one, table 1) and table2 can be aliased as b1t2 (root table 1, branch one, table2)
if I need to join another table through an inner join, as

SELECT
{rootFields},
{fields in table c},
{fields in table n}
FROM rootTable
LEFT JOIN table1 ON rootTable.ForeignkeyID=table1.ID
LEFT JOIN table2 ON table1.ForeignkeyID=table2.ID,
secondaryTable
{possible LEFT JOINs here}
WHERE rootTable.somefield=secondaryTable.somefield

..then any LEFT JOIN table tied to secondaryTable would start as r2b1t1 (root table2, branch 1, table 1), r2b1t2 etc.

One question is if there is some way to implement the structure of the last query so as to avoid impinging on the WHERE clause as I did above. But besides that I'm hoping some developers will a)see the pattern I'm seeing here and b) see if it grossly overlooks some fundamentals OR slows my code down. Remember this is designed to be a convention for an algorithm to build columns for the end user - I'm looking for a system to do that.

Thanks for your thoughtful input!

Weedpacket · Jan 11, 2009

sfullman wrote:
1. Because this method seems functionally equivalent but I don't know if it's slower

In the case that the rootTable doesn't have a dependent record in the child table, this method will produce a NULL for the field for that record, vs. making the record disappear. I figure if I have to babysit the disappearance of childless records that should have a child, my code wasn't written well to begin with, so why use a query to enforce it.

Statements 1 and 3 are contradictory. The original query is equivalent to an INNER JOIN, not a LEFT JOIN. I'd be interested to know how in "well-written" code you'd add the first child node to the root node because until then you'll have a "childless record that should have a child" (and yes, you can write code in the database to automatically delete a node if its last child is deleted. It's a bit more tedious than the code to delete all the descendants of a node if it's deleted, but still....).

Statement 2 is entirely cosmetic: it's up to the DBMS to compile the query into an execution plan, and the SQL statement itself is just a description of what you want done - not how to do it. In fact, the whole thing looks like cosmetics except for the passing mention about requiring nodes to have children.

sfullman · Jan 12, 2009

Not sure about your statement that 1 and 3 are contradictory, I think I mislabled between inner and outer join.

What I'm intending to do is standardise my query structure so that it can be "built" by an interface run eventually by not-so-savvy-to SQL folks, and it occured to me that most of my table builds amount to this:

select
{fields}
from
rootTable r1
   /* left join branch 1 from root */
   LEFT JOIN table r1b1t1 ON r1.fieldA=r1b1t1.fieldA [ AND more equalities..]
      LEFT JOIN table r1b1t2 ON r1b1t1.fieldB=r1b1t2.fieldB
         LEFT JOIN table r1b1t3 ON r1b1t2.fieldX=r1b1t3.fieldX ..
            .. [more maybe to get to needed table.field(s)]

   /* left join branch 2 from root */
   LEFT JOIN table r1b2t1 ON r1.fieldC=r1b2t1.fieldC [ etc.. ]

   /* more branches for more relevant columns */
   ,
secondaryTable r2
   ,
tertiaryTable r3

You're right it's semantics (and quite simplistic considering the power of SQL) but I realize it's how I get (and would allow unskilled users to get) 90%+ of my data. Or, almost all my db structures will admit to pulling "righteous" data this way. It's a way of simplifying structure to be stored in a hash, the premise being:

the "root" tables all have an INNER JOIN between themselves
each root table may have one or more "branches" of 1+ tables
each of the branch tables do not split off further (at most only one table joined to it)
the root tables are INNER JOIN'ed, the branch tables are LEFT JOINED
more broadly in the future, all root tables are INNER JOIN'ed, and all branch tables are OUTER JOIN'ed - queries that can't conform to this would need to be handled another way

My questions related to:
1. the efficiency of the query assuming all fields have been properly indexed
2. a way to keep everything in the FROM clause (as can be done with the left joins), vs adding to the WHERE clause:

where rootTable.field=secondaryTable.field AND secondaryTable.field=tertiaryTable.field

3. has someone smarter than me already come up with a better and more encompassing system

SO, more succinctly, how would I add the INNER JOIN of the three root tables in the FROM clause vs. adding two constraints to the where clause?

Thanks,
Samuel

sfullman · Jan 12, 2009

linking this all to PHP, here is a prototype of how the query would be stored. What is desired is that table names be preditable (if a bit abstruse like rootTable, rootTableSecondary, r1b1t1, etc. - but at least they're searchable and pretty unique)

$queryStructure['nickname-for-this-structure']=array(
	'version'=>'0.1 alpha',
	'structure'=>array(
		/* root table 1 - aliased AS rootTable */
		array(
			'table'=>'tableName',
			0=>NULL /* 0th node of first root table requires no join */,
			/* branch(es) - this is branch r1b1 */
			array(
				/* branch table(s) - this is r1b1t1 */
				array(
					'table'=>'finan_items_packages', /* will be aliased AS r1b1t1 */
					'joinStatement'=> 'rootTable.ID=r1b1t1.Items_ID' /* could be an array, or more complex to organize choices */
					'fieldStatements'=> 'r1b1t1.SomeField' /* fieldStatements referencing ONLY this table in the branch at this point could be stored here */,
					'replacesField'=>false /* do the fields above replace root fields - if so we alias them to the field name and since these field statements come later they will overwrite on the PHP assoc. array */
					'canProliferate'=>false, /* does this LEFT JOIN proliferate the root object - i.e. do we need to GROUP BY the root ID */
				),
				/* this is r1b1t2 */
				array(
					/* as above */
				)
			),
			/* this is r1b2 */
			array(
				/* this is r1b2t1 */
				array(
					/* as above */
				)
			)
		),
		/* secondaryTable - aliased AS rootTableSecondary */
		array(
			'table'=>'tableName(2)',
			0=>'rootTable.someField = secondaryTable.somefield', /* first inner join */
			/* branches as above */
		)
	)
);

Weedpacket · Jan 12, 2009

sfullman wrote:
Not sure about your statement that 1 and 3 are contradictory, I think I mislabled between inner and outer join.

They're contradictory because together they say that a LEFT JOIN is functionally equivalent to an INNER JOIN.

There are INNER JOINs, LEFT (OUTER) JOINS (the "OUTER" is one of those noise words that SQL is so enamoured of), RIGHT (OUTER) JOINS (which are just left joins with the arguments swapped), and FULL (OUTER) JOINS (which are the union of left and right joins).

sfullman wrote:
2. a way to keep everything in the FROM clause (as can be done with the left joins), vs adding to the WHERE clause:

Why? This is one of the main bits I don't get. (The other bit is insisting on wanting to use left joins for inner joins).

Roger_Ramjet · Jan 12, 2009

The first and most obvious comment is that user (and that includes application programmers) do not need to know the table structure/physical schema of your database.

If you need some articles/tutorials to explain this, google for n-tiered architectures.

Now, in an n-tiered, or even client/server architecture, the aps programmer would be coding to a VIEW of the data structures, not the underlying tables that store the data. So, as you rightly say, an Invoice object would have data normalised out into multiple tables. Now it would be a pain for every apps programmer to have to figure out and code all the joins that an Invoice requires before they could even begin to use an invoice object or records.

What the db architect should do is to code that for them with a view.

So, to solve your problem all you need to do is to compose the set of views required for your problem or application domain. You can then present this fully re-constructed data type to the user for querying: all your code needs to handle is the construction of a WHERE clause from user input.

over-structures for SQL query creation

Ssfullman

Weedpacket

Ssfullman

Ssfullman

Weedpacket

RRoger_Ramjet