The IN and NOT IN constraints and NULL values

November 24, 2010 01:28 by Matt Whitfield

Ok, so the IN clause, it's a simple way of saying 'I want a value that is also present in the result of this single column sub-query'.

But it has hidden complexities - let's have a look at those.

Consider the following test tables:

DECLARE @testData TABLE (value [int] NULL)INSERT INTO @testData ([value]) VALUES (1)INSERT INTO @testData ([value]) VALUES (2)INSERT INTO @testData ([value]) VALUES (3)INSERT INTO @testData ([value]) VALUES (NULL)DECLARE @testMatchValues TABLE (value [int] NULL)INSERT INTO @testMatchValues ([value]) VALUES (1)INSERT INTO @testMatchValues ([value]) VALUES (2)INSERT INTO @testMatchValues ([value]) VALUES (5)

So, if we run the query

SELECT * FROM @testData [td]  WHERE [value] IN (SELECT [tmv].[value] FROM @testMatchValues [tmv])

Then we expect to receive rows containing the values 1 and 2, because those are the intersection of the two test tables. Consequently, we also expect the same result when we reverse the tables:

SELECT * FROM @testMatchValues [tmv]  WHERE [value] IN (SELECT [td].[value] FROM @testData [td])

And this is what we get, from both queries.

So, when we run a query with NOT IN, we expect to get the other rows that do not form part of the intersection. For example, with the following query:

SELECT * FROM @testData [td]  WHERE [value] NOT IN (SELECT [tmv].[value] FROM @testMatchValues [tmv])

We expect to receive 3 and NULL, right? Well, that depends on the setting of ANSI_NULLS. If ANSI_NULLS is ON, then we get only the value 3.

Why is this?

It's because the NULL value is not equal to to any row from @testMatchValues - but, at the same time, it is not different from any row from @testMatchValues.

What?!?

Let's look at some simpler SQL:

DECLARE @test [int]SELECT CASE when @test = 3 THEN 1 ELSE 0 END AS TestEqualsThree,        CASE WHEN @test <> 3 THEN 1 ELSE 0 END AS TestDoesNotEqualThree

In this SQL, the value of @test is NULL, because it is not assigned. And both values are returned as 0 - meaning that @test = 3 is false, and @test <> 3 is false. Basically, both comparisons result is UNKNOWN, which is neither true nor false, but when evaluated in a CASE statement or WHERE clause, would not count as a match.

We can also see the same effect in reverse - so consider the SQL:

SELECT * FROM @testMatchValues [td]  WHERE [value] NOT IN (SELECT [tmv].[value] FROM @testData [tmv])

We might reasonably expect this to return us a single row with the value 5 - but it returns us no rows at all. Again, why?

Well, let's simplify it a bit. Consider this SQL:

SELECT 'Present' WHERE 5 NOT IN (1, 2, 3, NULL)

This is effectively replicating the match on the value 5 from @testMatchValues in the query above.

This query may also be written as:

SELECT 'Present' WHERE 5 <> 1 AND 5 <> 2 AND 5 <> 3 AND 5 <> NULL

And there we have it - we saw above that @test <> 3 would not count as a match - and here, in the same way, 5 <> NULL will not count as a match, because UNKNOWN is not true.

What have we learnt?

That any SQL using NOT IN where the list of values contains NULL will never return anything at all, because value <> NULL will never be a match.

Tags: ,

t-sql

Add comment



(Will show your Gravatar icon)



  Country flag

biuquote
  • Comment
  • Preview
Loading



CAPTCHA image




SQL Everywhere

Tag cloud

Calendar

<<  July 2014  >>
MoTuWeThFrSaSu
30123456
78910111213
14151617181920
21222324252627
28293031123
45678910

View posts in large calendar