CREATE FUNCTION [dbo].[udf_GetFeeAmount] ( @AccountID INTEGER ) RETURNS MONEY AS BEGIN DECLARE @Result MONEY SELECT @Result = NULL IF EXISTS(SELECT 1 FROM dbo.Accounts WHERE AccountID = @AccountID AND StatusID IN (1,4,8)) BEGIN SELECT @Result = COALESCE(SUM(Fees.Amount), 0) FROM dbo.Fees WHERE AccountID = @AccountID END RETURN @Result END
The problem with is not necessarily what's been written, but how it gets used. If this was a stored procedure outputting this value or setting an OUTPUT parameter, it'd be great. But say you're going to use this function in a SELECT statement that will return a million rows:
SELECT AccountID, dbo.udf_GetFeeAmount(AccountID) AS FeeAmount FROM dbo.Accounts
The function will be run for each of the 1,000,000 rows, thus generating an additional 1,000,000 queries. Naturally, the performance of such an operation will be miserable. The SQL Server engine can't run the function query in parallel, it has to run it again for each row emitted.
How then do we encapsulate logic like this in a way that still gets decent performance?
The answer is often a view or a table-valued function (which in many ways is a parameterized view). For example, this scenario could be re-written as this view:
CREATE VIEW [dbo].[vw_AccountFees] AS SELECT AccountID, ISNULL ( SUM ( CASE WHEN Accounts.StatusID IN (1,4,8) THEN Fees.Amount ELSE 0 END ) , 0 ) AS FeeAmount FROM dbo.Accounts LEFT JOIN dbo.Fees ON Accounts.AccountID = Fees.AccountID GROUP BY Accounts.AccountID
This view is simplistic for the purposes of this example - you could/should expose multiple Accounts columns from this view, so as to avoid the need of JOIN-ing to [dbo].[Accounts] again for other columns.
Writing the query this way allows the database engine to build an execution plan with parallel elements, thus allowing greater throughput and speed, and lower I/O cost.
Now, neither views nor table-valued functions are magic bullets. Table-valued functions don't always work so well in CROSS APPLY queries, and neither approach can save you from a query that's just badly written, or the fact that a table needs better indexing. But they open up optimization and scaling possibilities that querying in scalar functions simply doesn't give you.
The question may be asked: what are scalar-valued user-defined functions good for then? The answer is: for encapsulating non-table-based operations. Say you need to sanitize a column by doing multiple REPLACEs on it, and will need to do this sanitization in multiple places, or even on multiple columns. This would be an excellent candidate for logic that should go into a scalar function. I have written many scalar functions that do date operations that would be tedious to retype all the time, like this:
ALTER FUNCTION [dbo].[udf_GetMonthStart] ( @Date DATETIME ) RETURNS DATE AS BEGIN RETURN DATEADD(MONTH, MONTH(@Date)-1, DATEADD(YEAR,YEAR(@Date)-1900,0)) END
Scalar-valued functions are for computing scalars, not for operating on tables, and using them to do so works against the database engine and leads to poor performance.
No comments:
Post a Comment