SQL 'LIKE' syntax

nemonoman · May 15, 2007

There is NO pattern matching on a NULL.

If the value is NULL string comparison operators will NEVER return true:

value='' == FALSE
value<'a' == FALSE

Only NULL comparisons will return true or false

value IS NULL
value IS NOT NULL

SELECT *
FROM mytable
WHERE (value LIKE "[^E]%" OR value IS NULL)

As you can see, NULL columns are a pain in the ass and provide little value. Back when storage was expensive, (1960s) they made sense. In today's economy they are USELESS and should be avoided.

nemonoman · May 15, 2007

How to change your NULL column.

I assume you have 'ALTER' privileges

Find the column type with a DESCRIBE statement

DESCRIBE tableToFix

Field         Type             Null  Key  Default   Extra 
========================================
id      | tinyint(3) unsigned |   | | | |
nullcol | char(2)             |YES| | | |

You need to ALTER the table, keeping the attributes of the column, changing the NULL to NOT NULL

ALTER TABLE tableToFix CHANGE nullcol nullcol CHAR( 2 ) NOT NULL

ALTER privileges are typically GRANTed to someone managing the database. You are entirely within scope to request ALTER privileges. If you don't already have them, in all probablity your DB admin will GRANT them and apologize for not having done so before this.

I suggest you alter all NULL columns to NOT NULL. The overhead for this change is about 1 byte per record WHERE the current value of the field IS NULL. In your case, only that subset of records where the value IS NULL would add the extra byte.

Piranha · May 15, 2007

nemonoman wrote:
In today's economy they are USELESS and should be avoided.

Why? To me a column that have NULL as the value is not the same thing as a column with an empty string, the number 0 in mathematic or the default datetime in datetime columns.

As an example, it is a big difference if a product costs 0 € or if the value is NULL. The second case could be for example if the price is not yet known, but you want to show that you will get the product soon. The first case could as an example be that you want to send a product catalogue with every order without the customer having to pay for it.

nemonoman · May 15, 2007

There may indeed be exceptions to the general principle that I describe above. In the absence of a clear indication to the contrary, I suggest that NOT NULL is the best attribute for a column.

I don't believe that the instances you describe, however, necessarily represent exceptions that should cause the principle to be violated.

I should say that your examples show the importance of a primary rule of databases: DON'T BUILD INTELLIGENCE INTO THE DATA.

Your product example suggests that you really mean to assign a value to another attribute of the entity: I would say it's better to create a specific attribute like available, with a value of 0 or 1.

As to your datetime column example, if you don't want to pass NULL (which in MySQL will populate the field with the value NOW()), then pass a meaningful value, such as datetiime in the future or the past, or '00/00/0000 00:00:00'. Perhaps I don't understand in your example why a datetime column that IS NULL would be meaningful.

I think that most such examples would represent an attempt to build intelligence into the meaning of a field that IS NULL. A good rule of thumb is to avoid any attempt to create 'intelligent' data that requires interpretation.

Nemo

nemonoman · May 15, 2007

Alternatively, you can update existing rows:

UPDATE tableToFix
SET nullcol=''
WHERE nullcol IS NULL

When doing an INSERT of new records, SET the value of nullcol to ''. At this point nullcol probably is not being SET during the INSERT. So instead of
INSERT INTO myTableToFix (id) VALUES (1)
use
INSERT INTO myTableToFix(id, nullcol) VALUES (1,'')

nemonoman · May 15, 2007

Probably the best reason for avoiding NULLS is problem of JOINs particularlyLEFT or OUTER joins. These behave differently than might be expected based on NULL or NOT NULL attribute of the column.

SELECT customer.name, zip.city
FROM customer
LEFT JOIN zip
ON customer.zipcode=zip.code

If customer zipcode is a NOT NULL column, then this would return customer names with a blank for the city if the zipcode isn't set.

If the customer zipcode is a NULL column, then ONLY customer names where the zipcode is set (i.e., to '' or other string) would return records. Where the customer.zipcode is NULL, no record would be returned, not even the customer's name.

laserlight · May 15, 2007

I should say that your examples show the importance of a primary rule of databases: DON'T BUILD INTELLIGENCE INTO THE DATA.

Your product example suggests that you really mean to assign a value to another attribute of the entity: I would say it's better to create a specific attribute like available, with a value of 0 or 1.

Next question: what does 0 and 1 mean? Oh wait, have you not just assigned meaning to those two values? DON'T BUILD INTELLIGENCE INTO THE DATA.

I think that adding another column to substitute null values just adds another layer of complexity.

Probably the best reason for avoiding NULLS is problem of JOINs particularlyLEFT or OUTER joins. These behave differently than might be expected based on NULL or NOT NULL attribute of the column.

I think that you are taking a feature and calling it a bug.

Piranha · May 15, 2007

That is to me a situation that should never happen. I can't see any reason to leave a foreign key empty, doesn't matter if it is NULL or ''. Of course you should not allow NULL values as keys, no matter what kind of key it is.

nemonoman · May 15, 2007

Thanks, guys. You sure are smarter than me.

I'm a professional. I've done advance studying in logical data modeling... what do I know? I'm a geezer. I'm an IT manager in the telecommunications industry. I hire consultants. I present them with examples like the ones above and ask opinions. If they know and understand best practices, I may hire them. If they don't know or understand best practice, I find someone else.

Piranha you're right insofar as you go. My LEFT JOIN example might be thought of as an adhoc query. Such queries are fairly common as one-off management requests in most industry environments. In a production environment, one would be more rigorous, of course.

Laserlight: you ask: what does 0 and 1 mean? In most data processing that I've encountered 0 and 1 mean false and true respectively. If you wanted to assign 'Yes' or 'No' to as values, you could do so. In my response to Piranha's example, a reasonable person could look at an 'Available' column and discern that 'available=0' meant not available. But when looking at 'price IS NULL' what is the reasonable person to discern? In this case NULL carries a meaning other than 'price'. You'd need to explain that NULL means not yet set or not yet available. Such information is NOT the price. Such information is some sort of status.

If you think my advice is half-assed, that may be because you have not yet dealt with databases built with stupid ass design. Which is the exact sort of problem that started this thread.

In the problem that started this thread: WHY is this status field being populated with strings? Who came up with the idea of using strings? Someone who didn't know better.

Better design would be to populate with integers...these to relate to text either by library defiinition or in some related 'status' table: 0 = '', 1='Email', 2='Call', etc. So the simple query:

SELECT * FROM table WHERE status IN (1,3,4)

Which is pretty easy to understand, edit, etc,

and no need for LIKE, regular expressions, etc. A nice thing about this sort of good design is that statuses become effectively unlimited. Also long and helpful status meanings can be stored in a related table (status 16='Customer is a complete airhead, call only at your own risk')

"Clever" usages (like 'NULL means price not yet assigned') are what leads to surprises. In the industry, people don't like surprises.

As to your 'taking a feature and calling it a bug', please re-read the comment I made: such queries 'behave differently than might be expected'. Maybe you are already smart enough to know how your query will return data. Maybe you'll remember that price can be null, and you will include an asterisk and footnote about what this means when reporting your results.

Let me give you a better adhoc query example:

SELECT id, IF(price>10,'expensive', 'cheap') FROM product

Do you know by looking wheter this query will or will not return a list of all product ids?

laserlight · May 15, 2007

Laserlight: you ask: what does 0 and 1 mean? In most data processing that I've encountered 0 and 1 mean false and true respectively. If you wanted to assign 'Yes' or 'No' to as values, you could do so. In my response to Piranha's example, a reasonable person could look at an 'Available' column and discern that 'available=0' meant not available. But when looking at 'price IS NULL' what is the reasonable person to discern? In this case NULL carries a meaning other than 'price'. You'd need to explain that NULL means not yet set or not yet available. Such information is NOT the price. Such information is some sort of status.

The convention is as obvious as your observation that usually "0 and 1 mean false and true respectively". A NULL field means that the data is either unknown (e.g., missing) or not applicable. I agree that whether three valued logic should be used in the relational model has been the centre of holy wars, but I do not agree that this is a "surprise" when someone like Codd can advocate it.

Let me give you a better adhoc query example:

SELECT id, IF(price>10,'expensive', 'cheap') FROM product

Do you know by looking wheter this query will or will not return a list of all product ids?

If the price is unknown, is the product expensive or cheap? I think that the query should either take that into account, or assume that all prices are known.

EDIT:
hmm... I think I misread you. Your statement that "such information is NOT the price" is something that goes beyond your original "DON'T BUILD INTELLIGENCE INTO THE DATA" dictum, and I can agree with that. On the other hand, not being an expert myself, I have a hard time accepting that the alternative really is better when most of the literature I have come across uses three valued logic.

Thanks, guys. You sure are smarter than me.

I'm a professional. I've done advance studying in logical data modeling... what do I know? I'm a geezer. I'm an IT manager in the telecommunications industry. I hire consultants. I present them with examples like the ones above and ask opinions. If they know and understand best practices, I may hire them. If they don't know or understand best practice, I find someone else.

I can be convinced by your arguments, but I sure am not convinced by your credentials, or lack thereof, especially since your real life identity is not known to us. Stating that you may hire people if they subscribe to your "best practice" is like the threat of a teacher with tunnel vision: follow my way or I will fail you, whether or not my way is right or wrong.

nemonoman · May 15, 2007

I worried about that credentials bit myself. I now my own internet business (ad2ad.com), and I hire consultants from time to time. In previous lives I was the director of customer service and support for Cricket (mycricket.com). I made hiring decisions for about 10 full time and about 30 consultants. Before director of business systems for Qualcomm (qualcomm.com). About 30 consultants. Before that senior manager of the Northern Telecom (nortel.com) NOM project, managing about 150 fulltime and consultants, about half of which new hires.

Maintaining large, complex databases is a big concern of mine.

I have been a consultant myself, and developed logical data models for highly complex business processes, for which I was grossly overpaid.

There are several basic practices that I advocate in database design. NO INTELLIGENT DATA is one. When you get into the Real World, you'll probably appreciate that practice more and more. It's something to remember like 'look both ways' or "beer and wine don't mix". "Intelligent" data is everywhere, and it's a huge problem to manage and maintain. Learning to recoginize intelligent data, and then to eliminate it, is a huge help to the database team -- and to the business processes that created the 'intelligent data' to begin.

Another basic practice is: NO NULLS. In the systems I work on, joins of 10 to 20 tables are commonplace, and management types are forever demanding one-off ad hoc queries. Leave a few NULL columns around, and you spend days scratching your head about where the JOIN is failing. You expect 9000 rows, you get 8000 rows...what happened?? Oh, that column is NULLable...you discover 2 days later, after tearing your hair out.

Any good practice has exceptions. As a guy who has been in the thick of things, however, I can say with some authority that most of the persons writing on this board have not enough experience to recognize a good candidate for an exception. The vast majority of persons querying this community would do well to follow simple, reasonable practices; pissing and moaning about them, probably, but in the end, better off for using them.

Don't forget this message set started off with some guy wondering why he can't do a LIKE compare on a NULL value. This person is NOT essentially prepared to deal with 3-option logic. This person needs to get a basic toolset. Which I did my best to provide. We've all been novices, and with luck, we will all be novices again. I got involved with this community by needing help with really basic PHP questions, and I have enjoyed a lot of help and support.

I think I got a little snippy. It's been a tough week.

Sxooter · May 15, 2007

nemonoman wrote:
Probably the best reason for avoiding NULLS is problem of JOINs particularlyLEFT or OUTER joins. These behave differently than might be expected based on NULL or NOT NULL attribute of the column.

SELECT customer.name, zip.city
FROM customer
LEFT JOIN zip
ON customer.zipcode=zip.code

If customer zipcode is a NOT NULL column, then this would return customer names with a blank for the city if the zipcode isn't set.

If the customer zipcode is a NULL column, then ONLY customer names where the zipcode is set (i.e., to '' or other string) would return records. Where the customer.zipcode is NULL, no record would be returned, not even the customer's name.

If your database behaves this way it is broken. the left table, absent a limiting where clause, should return all rows whether the join value is null or not. PostgreSQL gets it right, and I'm betting later versions of MySQL do as well.

Unless I misunderstant what you're saying.

tomhath · May 16, 2007

Another basic practice is: NO NULLS.

I have to disagree with that statement. There's a very good reason to use NULL; as laserlight stated earlier, it means there is no meaningful value. For example in a hospital IT application, what is the Discharge date when the patient is still under care? You usually have a column that indicates the patient's status, but there is no sensible value for Discharge date.

I worked on a very large system where the DBA insisted on NO NULLS. Queries in that system were much more difficult that they would've been if we had used NULLS; you always needed to store some bogus indicator value when you didn't have a real value. That constantly messed up queries; you never wanted to display the bogus value in a report, you had to watch for it whenever you calculated aggregates like average/max/min, etc, etc.

NO INTELLIGENT DATA is one. When you get into the Real World, you'll probably appreciate that practice more and more.

That's generally good advice, assuming you really mean don't use undocumented tricks as a shortcut to doing the job right. I've worked on too many systems where the only way you could figure out what a value or combination of values meant was to read the source code and hope you can find all the special case logic. Flags and codes are perfectly okay to use as long as the allowed values are documented, to me that's an appropriate use of intelligent data.

This thread's Original Poster's problem is that he can't run a query, all he has is the web front-end. I'd certainly expect a Status column to have an explicit value for every possible status. It wouldn't surprise me if the NULL or blank values are the result of a bug. But that's just a guess. :queasy:

examancer · May 16, 2007

I don't actually know for sure that the column has a NULL value. It may just be an empty string.

What I'm looking for is a way to use MS SQL compatible pattern matching to return records which have an empty string ("") for a specific column.

I know that Status = "" would accomplish this, but I'm pretty sure this web interface works by removing the field from the WHERE clause of the query when its left blank on the web form. There is no way I can tell the database to do Status = "" because if I leave the Status box blank on the form the column isn't used in the query.

The application I am using here is a custom web interface used where I work. I have access to all the data in the database, but the search functions are limited. I am talking with the developers and trying to get them to implement many of my suggestions which would eventually eliminate my need to perform this search, but developer resources are limited with many of the other applications and projects being developed by my employer.

Work needs to be done now though... so I am working with what I know to develop try and develop a work around. I know its an html/javascript front end to a server side java application which utilizes MS SQL as its database. I know that most fields are searchable and each field has a drop down which lets you select "=" or "LIKE" which obviously determines the comparison method. I know that when LIKE is used standard MS SQL pattern matching is fully implemented. I also know that the java app seems to properly escape all user input before dropping it to the query (no SQL injection solutions).

So I'm left using pattern matching to filter records with different "Status" values. The column basically indicates whether a record has been given a status, and with all records beginning with no status its impossible for me to pull up just the ones which have no status so that I can process them and give them a status.

If anyone has a solution or can tell me if matching to an empty string without having an empty search string is even possible, please let me know.

nemonoman · May 16, 2007

I'm sorry that I misunderstood your question, examancer. You are officially screwed. You are trying to guess the contents of a sealed box (i.e., the content of the server-side script). There's really no telling what the hell the database looks like, nor what the query sent to the database looks like after it has been munged by the script.

See if the admin will grant you direct SELECT privileges. Then you can write your own (damned) web interface.

nemonoman · May 16, 2007

Tomhath:
The classic example of intelligent data is a product identifier. My teacher used an a
actual set of data from a stove manufacturer:

E4032W1L07
G5001B1R06

sort of broke down like:

E4-03-2-W1-L-97
G5-00-1-B1-R-06

E/G = Electric or Gas
4/5 = number of burners
03/00=number of electrical connections
2/1 = 220 / 110
W/B=White/Bronze
L/R=Left/Right Mount
06/97=year of introduction (could be 88, 89..07 etc)

When the company started making BLACK stoves, it added G to the product mix:
W/B/G= White/Bronze/Grey (since B was used for Bronze)

When the company started making GREEN stoves, it added R to the product mix
W/B/G/R= R for gReen

When the company started making RED stoves, it added D to the product mix
W/B/G/R/D = d for reD

Show red stoves = SELECT * FROM product WHERE id LIKE '%D%'

Etc.

When the company started making DC stoves it added D to the 2/1 voltage mix
D=DC 110

Now the "red stove" select breaks: it shows RED and DC stoves.

3 electrical connection stove select: SELECT * FROM product WHERE id LIKE '%03%' stopped working on Jan 03.

Etc.

Of course, this was the numbering method used by the Stove division. The method used by Microwave division was of course ccmpletely different. You'd have to know that Stove division IDs started with E or G and were followed by a digit: THEN you could interpret the the rest. For Microwaves it was E or D followed by a Y or L, etc. Of course when the Freezer division got involved, you'd need to know that E followed by a 4 THEN by a 1 or 2 meant a freezer not a stove. Etc.

You will find this sort of 'numbering' in a variety of forms in virtually any company with an existing product set. Except ones I have worked at. Although since I have left, they're probably back to their old tricks.

Roger_Ramjet · May 16, 2007

NULL == FALSE is about as basic as it gets. Whether the NULL comes from an in-table column or from no match in a left join etc, your applications are going to have to work with NULL values - and very useful they are as well.

Nemonoman, you seem to have forgotten one of the other basics of dbms and oop: data abstraction.

There is absolutely no reason for client applications, or their programmers, to 'know' about the physical storage used since they should only be dealing with the external view, or perhaps the mid-level conceptual view. Each should be documented for the application programmer in any case and a part of that documentation must be the range of possible values and their various meanings for each data element.

Roger_Ramjet · May 16, 2007

Sxooter wrote:
If your database behaves this way it is broken. the left table, absent a limiting where clause, should return all rows whether the join value is null or not. PostgreSQL gets it right, and I'm betting later versions of MySQL do as well.

Unless I misunderstant what you're saying.

Absolutely. MS SQL has always got that right (at least since ver 6 which was the first one I used in 1995).

MS SQL also has an interesting option where you can set

"Transact-SQL supports an extension that allows for the comparison operators to return TRUE or FALSE when comparing against null values. This option is activated by setting ANSI_NULLS OFF. When ANSI_NULLS is OFF, comparisons such as ColumnA = NULL return TRUE when ColumnA contains a null value and FALSE when ColumnA contains some value besides NULL. Also, a comparison of two expressions that have both evaluated to null values yields TRUE. " SQL2005 docs

This can be very powerful when being used in eg BI reporting, find all those items which have never had a particular transaction type or status set; things that are actually quite difficult to achieve is standard sql queries.

tomhath · May 16, 2007

nemonoman: The classic example of intelligent data is a product identifier. My teacher used an a actual set of data from a stove manufacturer:

Yea, that happens everywhere. Ever look at the Vehicle Identification Number (VIN) on your car? It's a similar coding scheme (model, body style, engine, where it was manufactured, ...). And of course we also have to deal with room numbers, bar codes, RF ids, etc.

You make a good point, it's generally better to deal with real world data like that in only one place (parse it, map it, whatever) and be explicit everywhere else.

nemonoman · May 16, 2007

Nemonoman, you seem to have forgotten one of the other basics of dbms and oop: data abstraction. --rogerramjet

Two comments:

First: At some point, the data modelling isn't abstract. This is typically the level where I work. I don't get down to the actual physical storage layer -- with large storage devices, however, and the compact data of the relational databases as we model them, the physical layer has become much less a concern.

Second: Thank you for suggesting that I may have "forgotten". That certainly would appear to be a possibility, as I am getting to that age when I feel lucky to remember whether or not I have already shaved, while I'm looking in the bathroom mirror. In this instance, however, I have not forgotten the principle. I am disputing it, and positing a very different principle, based on extrapolation from my frustrating experience.

I heard these principles expressed by others when I was starting out programmer. I rolled my eyes and shook my head. Only as I got deep into management and maintenance did I realize their practical value. Now others can roll their eyes and shake their heads when I express them. We're all part of the Circle of Geeks.

SQL 'LIKE' syntax

Nnemonoman

Nnemonoman

PPiranha

Nnemonoman

Nnemonoman

Nnemonoman

laserlight

PPiranha

Nnemonoman

laserlight

Nnemonoman

SSxooter

Ttomhath

Eexamancer

Nnemonoman

Nnemonoman

RRoger_Ramjet

RRoger_Ramjet

Ttomhath

Nnemonoman