Thursday, March 24, 2016

SQL Dependency Hell: User Functions in Computed Columns

SQL Server allows you to create columns that don't contain data, but rather are calculations, hence "computed" columns. For example, say you have a table with an OpenDate and a CloseDate. You could create a OpenDays computed column which returns the value of the function DATEDIFF(DAY, OpenDate, CloseDate). It's useful, logical, and runs fast. (A computed column can just be an expression, for example a column Interest that equals Balance * 0.01, but in this post we're going to focus on columns using functions.) The value of the computed column is by default calculated on demand (e.g., whenever the column is used in a SELECT or WHERE), but can also be marked persisted so the value is actually saved to disk and then updated on recalculation.

This is all well and good for built-in functions. However, when you start using user-defined functions, things can get pretty ugly pretty fast.

Consider this pretty standard user table:

CREATE TABLE [dbo].[Users]
(
 [ID] [int] IDENTITY(1,1) NOT NULL,
 [MembershipUserID] [uniqueidentifier] NOT NULL,
 [UserDomainID] [int] NOT NULL,
 [SecurityLevelID] [int] NULL,
 [FirstName] [varchar](50) NOT NULL,
 [LastName] [varchar](50) NOT NULL,
 [EmailAddress] [varchar](100) NOT NULL,
 [CreationDate] [datetime] NOT NULL,
 [PhoneNumber] [varchar](50) NULL,
 [FaxNumber] [varchar](50) NULL,
 [AddressID] [int] NULL,
 [InvalidEmail] [bit] NULL,
 [ManagerUserID] [int] NULL,
 [HoursWorkedPerWeek] [decimal](5, 2) NULL,
 [LanguageID] [int] NULL
)

Now let's say I want to add a computed column FullName that runs my user-defined function udf_FullName. This function just glues FirstName and LastName together, so it's not doing anything crazy. Adding the computed column is easy:

ALTER TABLE dbo.Users ADD FullName AS dbo.udf_FullName(FirstName, LastName)

Now let's fast-forward three months - the client/business user has decided that it wants all names to listed in "Surname, GivenName" format, so I've got to change udf_FullName. It's a pretty trivial change, so I go in and write my ALTER FUNCTION statement. But when I hit Execute, I get this:

Msg 3729, Level 16, State 3, Procedure udf_FullName, Line 1
Cannot ALTER 'dbo.udf_Fullname' because it is being referenced by object 'Users'.

The function can't be altered because it's being used in a computed column. Hmm. It doesn't really help to consolidate logic if we can't ever change it.

This can be gotten around by dropping the column, modifying the function, and then re-adding the column. If the column value isn't being persisted, this probably won't be a particularly expensive operation. But if there are any views, stored procedures, other functions, etc. that depend on this table and use SCHEMABINDING, you'll quickly run into a cascading chain or updates you have to script. The column will be moved to the end of the table as well, which can cause heartburn for some clients. These are not insurmountable challenges, but they are hassles you just don't need to make for yourself.

Where things can really get ugly is when the function runs a sub-query. I've detailed in a previous post the performance pitfalls of scalar functions running queries, and they become even worse when put into a computed column. A sub-query function used in another query is bad enough, but it's somewhat scope-restricted - it'll only be used in certain circumstances. But with a computed column, the function and its sub-query have the potential to be re-run every single time the table is touched!

Oh, but it gets worse. You can actually get into a scenario where the table and the function have a circular dependency on each other! Consider this function:

CREATE FUNCTION [dbo].[udf_GetUserRoles]
   (@userID int)
RETURNS NVARCHAR(MAX)
WITH SCHEMABINDING
AS
BEGIN
 DECLARE @roleList NVARCHAR(MAX)

 SELECT @roleList = COALESCE(@roleList+',', ' ') + r.RoleName
 FROM dbo.Users u
 INNER JOIN dbo.aspnet_UsersInRoles ur on u.MembershipUserID = ur.UserId
 INNER JOIN dbo.aspnet_Roles r on ur.RoleId = r.RoleId
 WHERE u.ID = @userID

 RETURN @roleList
END

If we ignore the performance, it's not necessarily a bad function. But it's possible to call it from a computed column on the Users table itself!

ALTER TABLE dbo.Users ADD RoleList AS [dbo].[udf_GetUserRoles]([ID])


Fun fact: if you have a database with this setup, and you try to use Microsoft's Generate Scripts tool, it will crash as it gets into an infinite loop of attempted dependency resolution!

Computed columns are nice in theory, but they are difficult to change, especially when functions are involved. For this reason, they should be used sparingly, and they shouldn't use user-defined functions at all.

No comments:

Post a Comment