Excel Import: Excellent case/example

Not bad (again) -- to say the least.
I agree, the data in the example is simple and joining it as XML is
overkill, but your example has several finesses and techniques and is as suc
h
complete and ready to use for larger @.Left and @.Right nodes() that can be
joined (assuming they're @.LeftXPath and @.RightXPath nodes() have parallel
structures). The @.* variables refer to your last example.
This example is also interesting as it uses flexible dynamic SQL, has nifty
debugging, and can inspire variation.
This thread, i think is nearly finished, but of course - any comments
concerning performance consideration/optimisations are welcome. Or if any
misunderstanding hase cropt up in my coments here.
I am also including in the below a way to match elements from two comma
delimited strings. The example below just uses nvarchar. It is by far less
flexible than your example, but can be used where no injection detection is
needed and for more simple data.
CREATE FUNCTION [dbo].[IsNameInString]
(
@.LeftIems nvarchar(max),
@.RightItems nvarchar(max)
)
RETURNS bit
AS
BEGIN
DECLARE @.aLeftItemToCheck nvarchar(100),
@.aRightItem nvarchar(100),
@.TempList nvarchar(max),
@.Pos int,
@.aPos int,
@.Count int
SET @.LeftIems = LTRIM(RTRIM(@.LeftIems))+ ','
SET @.Pos = CHARINDEX(',', @.LeftIems, 1)
IF REPLACE(@.LeftIems, ',', '') <> ''
BEGIN
WHILE @.Pos > 0
BEGIN
SET @.aLeftItemToCheck = ''
SET @.aLeftItemToCheck = LTRIM(RTRIM(LEFT(@.LeftIems, @.Pos - 1)))
IF @.aLeftItemToCheck <> ''
BEGIN
Set @.TempList = ''
Set @.aPos = 0
SET @.TempList = LTRIM(RTRIM(@.RightItems))+ ','
SET @.aPos = CHARINDEX(',', @.TempList, 1)
IF REPLACE(@.TempList, ',', '') <> ''
BEGIN
WHILE @.aPos > 0
BEGIN
SET @.aRightItem = LTRIM(RTRIM(LEFT(@.TempList, @.aPos - 1)))
IF @.aRightItem <> ''
Begin
If @.aRightItem = @.aLeftItemToCheck
BEGIN
RETURN 1
End
End
SET @.TempList = RIGHT(@.TempList, LEN(@.TempList) - @.aPos)
SET @.aPos = CHARINDEX(',', @.TempList, 1)
END
END
END
SET @.LeftIems = RIGHT(@.LeftIems, LEN(@.LeftIems) - @.Pos)
SET @.Pos = CHARINDEX(',', @.LeftIems, 1)
END
END
RETURN 0
END
Regards,
Paul
"Marc Gravell" wrote:

> tidied, removed table-var, and added injection detection
> DROP PROC XmlJoin
> GO
> Create PROC XmlJoin
> (
> -- inputs
> @.Left xml, @.LeftXPath nvarchar(50),
> @.Right XML, @.RightXPath nvarchar(50) = NULL,
> -- outputs
> @.Result xml OUTPUT,
> -- xml options
> @.Path nvarchar(50) = 'node', @.Row nvarchar(50) = '', @.Root
> nvarchar(50) = 'xml',
> -- query options
> @.Distinct bit = 0,
> @.Debug bit = 0
> )
> AS
> BEGIN
> DECLARE @.Query nvarchar(max)
> -- verify inputs
> SET @.LeftXPath = ISNULL(@.LeftXPath, '')
> SELECT @.RightXPath = ISNULL(@.RightXPath, @.LeftXPath),
> @.Path = ISNULL(@.Path, N'node'),
> @.Row = ISNULL(@.Row, N''), @.Root = ISNULL(@.Root, N'')
> IF CHARINDEX(N'''', @.LeftXPath+@.RightXPath+@.Path+@.Row+@.Root)
> 0
> BEGIN
> RAISERROR ('Invalid character',16,1)
> RETURN
> END
> -- build command
> SET @.Query = N'
> SELECT @.Result = (
> SELECT ' + CASE @.Distinct WHEN 1 THEN N'DISTINCT ' ELSE N'' END
> + 'l.n.value(''.'',''varchar(20)'') AS [' + @.Path + N']
> FROM @.Left.nodes(''' + @.LeftXPath + N''') l(n)
> INNER JOIN @.Right.nodes(''' + @.RightXPath + N''') r(n)
> ON l.n.value(''.'',''varchar(20)'') =
> r.n.value(''.'',''varchar(20)'')
> FOR XML PATH(''' + @.Row + N'''), ROOT(''' + @.Root + N''')
> )'
> -- debug?
> IF @.Debug = 1
> BEGIN
> PRINT @.Query
> SELECT @.Left AS [@.Left], @.Right AS [@.Right]
> END
> -- exec
> EXEC sp_ExecuteSQL @.Query,
> N'@.Left xml, @.Right xml, @.Result xml OUTPUT',
> @.Left, @.Right, @.Result OUTPUT
> END
> GO
> SET NOCOUNT ON
> DECLARE @.Result xml
> EXEC XmlJoin
> @.Left =
> '<roles><role>N12</role><role>N12</role><role>N13</role><role>S39</role><r
ole>S14</role></roles>',
> @.LeftXPath = 'roles/role',
> @.Right = '<roles><role>N12</role><role>S39</role></roles>',
> @.Result = @.Result OUTPUT,
> @.Path = 'role', @.Root = 'roles', @.Debug = 1
> SELECT @.Result
>
>Regarding performance of the xml code; I guess there are two aspects
here... sp_ExecuteSQL benefits in that it uses the query cache, so any
uses of the same xpaths (with different xml) have the chance to share
a query plan to reduce recompilation. Of course, xml performance is a
black art, and the usage of value() pretty-much precludes a lot of
indexing - but as a general-purpose utility it does the job.
Performance-critical code should run against tables (not variables)
which can have XML indexing applied; and queries should be moved
*inside* the xml query (via sql:parameter) so that the xml indexing
can be used - but that isn't what we are discussing... just an aside.
The CSV stuff is interesting; the parsing approach that you've used
will probably work, but I'd be rather tempted to cobble together a
table function (UDF) that splits a /single/ delimted string, and call
it twice (perhaps into @.tables or #tables, or perhaps just "as is") -
this will let SQL Server do what it does best: set based operations.
Besides which, a simple "split this string" is so useful for day-to-
day operations that such a function should exist in every database; it
also demonstrates a level of "do one thing well; then join 'em
together" re-use.
Actually, it *really* annoys me that MS haven't added an optimised
method for delimited string splitting, rather than having to write it
in TSQL which simply isn't ideal for the job; I know you can call CLR
code in 2005/2008, but I doubt that it is worth the overhead of
loading the runtime... I've resisted CLR/SQL code so far, and it just
seems so... unnecessary!
Example code for a join (once you have the UDF written - apols for the
name; not my choice ;-p)
DECLARE @.Left varchar(max), @.Right varchar(max)
SELECT @.Left = 'a,b,d,c,g,h',
@.Right = 'h,b,e,b'
SELECT l.Value
FROM SplitMaxArray_udf(@.Left,',',1,1) l
INNER JOIN SplitMaxArray_udf(@.Right,',',1,1) r
ON r.Value = l.Value
or if you just want to check for a single match you could use EXISTS.
You can get UDF TSQL for CSV->table easily enough on t'net, so I won't
bother posting that unless prompted.
Marc|||> Actually, it *really* annoys me...
*especially* since you can now pass xml down to the database and
parse /that/ easier than anything... but delimited values, nope...
DIY.
Sorry; </rant>
Marc|||It might look something like this:
DECLARE @.Left varchar(max), @.Right varchar(max)
SELECT @.Left = 'a,b,h,h,d,c,g,h',
@.Right = 'h,b,e,b'
declare @.csv varchar(max)
select @.csv = coalesce(@.csv + ',' , '') +
l.aValue
FROM SplitMaxArray_udf(@.Left,',',1) l
INNER JOIN
(select distinct r.aValue
from SplitMaxArray_udf(@.Right,',',1) r) ro
ON ro.aValue = l.aValue
select @.csv
For the SplitMaxArray_udf UDF, it might go something like this:
IF OBJECT_ID (N'dbo.SplitMaxArray_udf') IS NOT NULL
BEGIN
DROP FUNCTION dbo.SplitMaxArray_udf
END
GO
CREATE FUNCTION SplitMaxArray_udf
(
-- Add the parameters for the function here
@.Array nvarchar(1000),
@.Separator char(1)= ',',
@.Debug bit = 0
)
RETURNS @.ValueTable table
(
-- Add the column definitions for the TABLE variable here
aValue nvarchar(50)
)
AS
BEGIN
IF(@.Separator is NULL or @.Separator = '')
BEGIN
set @.Separator = ','
END
declare @.separator_position int
declare @.array_value varchar(1000)
set @.Array = @.Array + @.Separator
declare @.percentageSep char(3)
set @.percentageSep = '%' + @.Separator + '%'
while patindex(@.percentageSep , @.Array) <> 0
begin
select @.separator_position = patindex(@.percentageSep , @.Array)
select @.array_value = left(@.array, @.separator_position - 1)
Insert @.ValueTable
Values (@.array_value)
select @.array = stuff(@.Array, 1, @.separator_position, '')
end
RETURN
END
GO
Regards,
Paul
"Marc Gravell" wrote:

> Regarding performance of the xml code; I guess there are two aspects
> here... sp_ExecuteSQL benefits in that it uses the query cache, so any
> uses of the same xpaths (with different xml) have the chance to share
> a query plan to reduce recompilation. Of course, xml performance is a
> black art, and the usage of value() pretty-much precludes a lot of
> indexing - but as a general-purpose utility it does the job.
> Performance-critical code should run against tables (not variables)
> which can have XML indexing applied; and queries should be moved
> *inside* the xml query (via sql:parameter) so that the xml indexing
> can be used - but that isn't what we are discussing... just an aside.
> The CSV stuff is interesting; the parsing approach that you've used
> will probably work, but I'd be rather tempted to cobble together a
> table function (UDF) that splits a /single/ delimted string, and call
> it twice (perhaps into @.tables or #tables, or perhaps just "as is") -
> this will let SQL Server do what it does best: set based operations.
> Besides which, a simple "split this string" is so useful for day-to-
> day operations that such a function should exist in every database; it
> also demonstrates a level of "do one thing well; then join 'em
> together" re-use.
> Actually, it *really* annoys me that MS haven't added an optimised
> method for delimited string splitting, rather than having to write it
> in TSQL which simply isn't ideal for the job; I know you can call CLR
> code in 2005/2008, but I doubt that it is worth the overhead of
> loading the runtime... I've resisted CLR/SQL code so far, and it just
> seems so... unnecessary!
> Example code for a join (once you have the UDF written - apols for the
> name; not my choice ;-p)
> DECLARE @.Left varchar(max), @.Right varchar(max)
> SELECT @.Left = 'a,b,d,c,g,h',
> @.Right = 'h,b,e,b'
> SELECT l.Value
> FROM SplitMaxArray_udf(@.Left,',',1,1) l
> INNER JOIN SplitMaxArray_udf(@.Right,',',1,1) r
> ON r.Value = l.Value
> or if you just want to check for a single match you could use EXISTS.
> You can get UDF TSQL for CSV->table easily enough on t'net, so I won't
> bother posting that unless prompted.
> Marc
>|||Please allow me to connect to two things i mentioned in previous mails to
this thread, and it has to do with XML/XPath context, and change friendly
XPath through dynamic SQL.
By doing an SQL inner join you essentially where able to join XML from two
separate XML contexts.
I had difficulty achieving this with FLOWR using SQL Server. For example
(something like the below)
for $LeftItem doc(@.Left)/roles,
for $RightItem doc(@.Right)/roles
where $LeftItem/[role = $RightItem/child::*]
return $LeftItem/role
This would have been fine, but SQL Server doesn't support the doc()
function.
My question here, would be, I guess, how otherwise to (currently with SQL
Server) to connect/compare/join two XML contexts?
Another avenue i thought of before I saw your inner join was to concantenate
the two xmls that i would like to compare, and then access the "two" parts
through two separate Xpaths. By concantenating the two XMLs (@.Left and
@.Right) i would (subsequently) only need to use one XML context.
The next thing is concerning the dynmaic SQL and the need to dynamically set
the Xpath. One reason for this is because the XML datamodel (in the databas
e
XML column) might and probably will change. By being able to set the Xpath
dynmically we are better (there is no guarantee with data models) better to
meet change much more flexibly.
I needed to write the above to things, as i think, they (OK, - only
partially) answer the "why" question to this thread.
--
Regards,
Paul
"Marc Gravell" wrote:

> *especially* since you can now pass xml down to the database and
> parse /that/ easier than anything... but delimited values, nope...
> DIY.
> Sorry; </rant>
> Marc
>

Friday, February 24, 2012

Excellent case/example

No comments:

Post a Comment

Excel Import

Blog Archive

About Me