This is all well and good for built-in functions. However, when you start using user-defined functions, things can get pretty ugly pretty fast.
Consider this pretty standard user table:
CREATE TABLE [dbo].[Users] ( [ID] [int] IDENTITY(1,1) NOT NULL, [MembershipUserID] [uniqueidentifier] NOT NULL, [UserDomainID] [int] NOT NULL, [SecurityLevelID] [int] NULL, [FirstName] [varchar](50) NOT NULL, [LastName] [varchar](50) NOT NULL, [EmailAddress] [varchar](100) NOT NULL, [CreationDate] [datetime] NOT NULL, [PhoneNumber] [varchar](50) NULL, [FaxNumber] [varchar](50) NULL, [AddressID] [int] NULL, [InvalidEmail] [bit] NULL, [ManagerUserID] [int] NULL, [HoursWorkedPerWeek] [decimal](5, 2) NULL, [LanguageID] [int] NULL )
Now let's say I want to add a computed column FullName that runs my user-defined function udf_FullName. This function just glues FirstName and LastName together, so it's not doing anything crazy. Adding the computed column is easy:
ALTER TABLE dbo.Users ADD FullName AS dbo.udf_FullName(FirstName, LastName)
Now let's fast-forward three months - the client/business user has decided that it wants all names to listed in "Surname, GivenName" format, so I've got to change udf_FullName. It's a pretty trivial change, so I go in and write my ALTER FUNCTION statement. But when I hit Execute, I get this:
Msg 3729, Level 16, State 3, Procedure udf_FullName, Line 1
Cannot ALTER 'dbo.udf_Fullname' because it is being referenced by object 'Users'.
The function can't be altered because it's being used in a computed column. Hmm. It doesn't really help to consolidate logic if we can't ever change it.
This can be gotten around by dropping the column, modifying the function, and then re-adding the column. If the column value isn't being persisted, this probably won't be a particularly expensive operation. But if there are any views, stored procedures, other functions, etc. that depend on this table and use SCHEMABINDING, you'll quickly run into a cascading chain or updates you have to script. The column will be moved to the end of the table as well, which can cause heartburn for some clients. These are not insurmountable challenges, but they are hassles you just don't need to make for yourself.
Where things can really get ugly is when the function runs a sub-query. I've detailed in a previous post the performance pitfalls of scalar functions running queries, and they become even worse when put into a computed column. A sub-query function used in another query is bad enough, but it's somewhat scope-restricted - it'll only be used in certain circumstances. But with a computed column, the function and its sub-query have the potential to be re-run every single time the table is touched!
Oh, but it gets worse. You can actually get into a scenario where the table and the function have a circular dependency on each other! Consider this function:
CREATE FUNCTION [dbo].[udf_GetUserRoles] (@userID int) RETURNS NVARCHAR(MAX) WITH SCHEMABINDING AS BEGIN DECLARE @roleList NVARCHAR(MAX) SELECT @roleList = COALESCE(@roleList+',', ' ') + r.RoleName FROM dbo.Users u INNER JOIN dbo.aspnet_UsersInRoles ur on u.MembershipUserID = ur.UserId INNER JOIN dbo.aspnet_Roles r on ur.RoleId = r.RoleId WHERE u.ID = @userID RETURN @roleList END
If we ignore the performance, it's not necessarily a bad function. But it's possible to call it from a computed column on the Users table itself!
ALTER TABLE dbo.Users ADD RoleList AS [dbo].[udf_GetUserRoles]([ID])
Fun fact: if you have a database with this setup, and you try to use Microsoft's Generate Scripts tool, it will crash as it gets into an infinite loop of attempted dependency resolution!
Computed columns are nice in theory, but they are difficult to change, especially when functions are involved. For this reason, they should be used sparingly, and they shouldn't use user-defined functions at all.
No comments:
Post a Comment