I ran into an issue recently where an application was not updating all the values it was supposed to be, and other values were mysteriously changing when they weren’t supposed to. Yea, sounds like a lookup problem. Either keys were being mixed up or values which were not keys were being used to do updates. The latter shouldn’t really be a case, since keys should _always_ be used to do updates (this forms the foundation for the relational model), but sometimes we run into this sort of thing in the wild.
At any rate, the problem had manifested itself in counts which were not exactly accurate. We were expecting zero or very near zero records which didn’t have a verification date on them, since the ad-hoc process we generated was supposed to create one based on the last modified date of the profile. However, after running, it turned out that several thousand records weren’t being updated. This sounds like a lot, but it was only about 1 – 2% of all records, so we forgot it temporarily and moved on.
Later, we started getting calls that our work to invalidate certain profiles was having an adverse effect, and updating records which it should have been updating. This was a big problem, since it would nag customers about it when there was _nothing wrong_. This is among the last things you’d ever want to do to your customers, for a number of reasons (irritating spam-sensitive users, desensitizing the behavior you want to encourage). This prompted action.
I traced the issue to code other than my own (again – whew! – it’s hard being an outsider and lacking the experience of the large institutional platform which everyone else has been living and breathing) – this time to logic which was updating the database in a batch update utility. The problem apparently was an assumption that the code would update correctly because of the position of the boolean check in the conditional statement. There was a two-fold problem to this logic, since the base assumption that the check would short-circuit was incorrect was complicated by the indeterminate (exluding use of the ORDER BY clause, that is) order of results of a SELECT statement. The programmer didn’t seem to know either of these things, which reinforces my belief that _every expert enterprise developer needs to have a firm grasp on database storage and retrieval fundamentals_ as well as near-expert depth of the peculiarities of the chosen vendor specifics. I’m not sure how other platforms do it, but SQL Server doesn’t short circuit OR statements (but it does AND statements). The docs try to lead us astray on this one, but a simple test shows us everything:
CREATE TABLE #Test ( myValue nvarchar(20) )
INSERT INTO #Test VALUES ( 'Test1' )
INSERT INTO #Test VALUES ( 'Test2' )
INSERT INTO #Test VALUES ( 'Test3' )
--The entire "OR" statement is evaluated, and not short circuited, even thought the first value is true (1/0 will return an error)
SELECT * FROM #Test WHERE myValue LIKE 'Test%' OR 1/0 = 1
--The "AND" statement is short-circuted, since T-SQL 8.0 does this for "AND" (1/0 will return an error) SELECT * FROM #Test WHERE 1 = 2 AND 1/0 = 1
--Test 1 will be returned first, even though it is listed 2nd, because the way the inserts happened, and how the loader retrieves the rows into the page buffers, and then how the query engine reads the buffers…
SELECT * FROM #Test WHERE myValue = 'Test2' OR myValue = 'Test' + '1'