A generic list for passing table-valued parameters to SQL Server

May 21, 2011 21:11 by Matt Whitfield

In my current day job, one of the things I need to do is load log files from devices into SQL Server. Nothing out of the ordinary, but the data volume made for an interesting time - because each log file has maybe 100 or 200 rows. So not the sort of volume that you'd like to bulk load, and definitely not the sort of volume that you'd want to insert with individual insert statements. Because we're using SQL Server 2008 I thought that table-valued parameters would be the obvious choice.

So I read up on it and became concerned after reading Bob Beachemin's Blog at SQL Skills about TVP use and plan compilation. That had me worried, and so I was pleased to find, when I checked again, that he had blogged about it again, but this time with more positive news.

So I decided to implement a solution for data loading based on the use of TVPs. Researching that some more, I found a blog by Leonard Lobel about passing the data in. His method of explitly implementing IEnumerable<SqlDataRecord> fit pretty exactly with the method in which I wanted to pass the data to SQL Server. So I sat down at home and came up with the following utility class (yes I am sad and write day job code in the evenings!) which I thought would be good to share with you...

The idea is that it's a simple wrapper around List<T> that provides the explicit implementation of IEnumerable<SqlDataRecord>, but while remaining generic in and of itself. This necessitated that the classes that would use the list would be able to represent themselves as a SqlDataRecord object, and that the generic list would infer this through the use of a generic type constraint.

So here's the interface:

using System;
using Microsoft.SqlServer.Server;

namespace TVPTest
{
    /// <summary>
    /// IDataRecordTransformable is the interface to an object that can
    /// transform itself into a SqlDataRecord, for use with the
    /// <see cref="SqlDataRecordList&lt;T&gt;"/> class in order to load
    /// a data stream into a table-valued parameter
    /// </summary>
    public interface IDataRecordTransformable
    {
        /// <summary>
        /// Creates a SqlDataRecord object that represents the meta-data
        /// of each table row
        /// </summary>
        /// <returns>A SqlDataRecord object</returns>
        SqlDataRecord CreateDataRecord();

        /// <summary>
        /// Populates an existing SqlDataRecord object with the values
        /// from the current row
        /// </summary>
        /// <param name="dataRecord">The SqlDataRecord to populate</param>
        void PopulateDataRecord(SqlDataRecord dataRecord);
    }
}

So we have two methods, CreateDataRecord - which is the method that creates the SqlDataRecord object and populates it with columns, and PopulateDataRecord which is passed the existing SqlDataRecord and is expected to call the various set methods in order to set the values.

Let's have a look at the list class itself then...

using System;
using System.Collections.Generic;
using Microsoft.SqlServer.Server;

namespace TVPTest
{
    /// <summary>
    /// SqlDataRecordList represents a list of objects that are
    /// <see cref="IDataRecordTransformable"/>, and therefore can be used
    /// to represent the data passed to a table-valued parameter
    /// </summary>
    /// <typeparam name="T">The type of object in the list</typeparam>
    public class SqlDataRecordList<T> : List<T>, IEnumerable<SqlDataRecord>
        where T : IDataRecordTransformable
    {
        /// <summary>
        /// The enumerator for the SqlDataRecord list, used with
        /// SqlDbType.Structured
        /// </summary>
        /// <returns>An IEnumerator object to enumerate the
        /// SqlDataRecord list</returns>
        IEnumerator<SqlDataRecord> IEnumerable<SqlDataRecord>.GetEnumerator()
        {
            // declare the data record that we will populate
            SqlDataRecord dataRecord = null;

            // if we have entries
            if (this.Count > 0)
            {
                // create the meta-data based on the first entry
                dataRecord = this[0].CreateDataRecord();

                // and yield return each record
                foreach (T entry in this)
                {
                    // populate it
                    entry.PopulateDataRecord(dataRecord);

                    // and yield it for IEnumerable
                    yield return dataRecord;
                }
            }
            else
            {
                // no elements present
                yield break;
            }
        }
    }
}

No particular rocket-science here. Just a generic class with a type constraint, implementing the IEnumerator using the yield and yield break keyword. Job done.

Any questions?

Tags: , ,

clr

Preventing non-qualified object use with DDL Triggers

September 20, 2010 09:05 by Matt Whitfield

There was a question on Ask SQL Server Central recently, which got me thinking. Was there a way that we could mandate that all object use should be qualified? As it turns out, the answer is relatively simple - and took the form of a DDL trigger.

Now, this DDL trigger is 2008+ only, due to it's use of [sys].[sql_expression_dependencies], but what it effectively does is make sure that any object that is referenced by the object we've just created must be referenced by it's schema qualified name.

Without further ado, here is the code:

ALTER TRIGGER [trig_InhibitNonQualifiedObjectUse] ON DATABASE FOR DDL_DATABASE_LEVEL_EVENTSAS BEGIN    SET NOCOUNT ON;    DECLARE @data XML;    DECLARE @schema sysname;    DECLARE @object sysname;    DECLARE @eventType sysname;    DECLARE @fullObjectName [varchar](MAX);     SET @data = EVENTDATA();    SET @eventType = @data.value('(/EVENT_INSTANCE/EventType)[1]', 'sysname');    SET @schema = @data.value('(/EVENT_INSTANCE/SchemaName)[1]', 'sysname');    SET @object = @data.value('(/EVENT_INSTANCE/ObjectName)[1]', 'sysname');    SET @fullObjectName = QUOTENAME(@schema) + '.' + QUOTENAME(@object);        IF @object IS NOT NULL    BEGIN        IF EXISTS (SELECT * FROM [sys].[sql_expression_dependencies]                   WHERE [referencing_id] = OBJECT_ID(@fullObjectName)                     AND [referenced_schema_name] IS NULL)        BEGIN            DECLARE @errorMessage [varchar](MAX);            SET @errorMessage = @eventType + ' - ' + @fullObjectName +                 ': Operation terminated due to non-qualified object use';            RAISERROR(@errorMessage,17,0)            ROLLBACK;        END    ENDEND;GO

Tags: , ,

t-sql

Column & Parameter metadata

April 14, 2010 17:31 by Matt Whitfield

In versions of SQL Server prior to 2000, the column and parameter metadata was a mess. In fact, most of the metadata was a mess, but columns and parameters were particularly messy, seeing as they were all stored in [syscolumns]. Columns had positive colid values, parameters had negative colid values (except for return values, which were 0). Nasty.

Thankfully, in SQL Server 2005 this situation was sorted out considerably, leaving us with a system view for each category of object: [sys].[columns] and [sys].[parameters]. They still have some things in common - because they both deal with typed information, each of them have the columns [system_type_id], [user_type_id], [max_length], [precision] and [scale].

Let's look at what these columns mean:

system_type_idThis column relates the column or parameter to [sys].[types]. However, this column tells you the class of data, rather than the exact type. So, if a user defined data type 'myBigInteger' is based on a bigint, then this is the column that tells you that the data in question is a bigint.
user_type_idThis column again relates the column or parameter to [sys].[types]. This column does tell you the exact type of the data, and allows you to derive the exact type name for the column or parameter. 
max_lengthThis column, you might think, is the maximum length of the column. It's not. It's the storage space that's used. So for a [text] column, you'll see 16, even though the maximum length of the data that can be stored in the text type is a byte under 2GB. (MAX) and [xml] data types are the exception here - you'll see -1 in the [max_length] column for those. And a word of warning about nchar and nvarchar types - here you will see double the number declared as the data length, because those types take 2 bytes per stored character.
precisionPrecision is useful for numeric types. You will see values for other types, but they don't bear a lot of relevance. For example, do you really need to look at metadata to know how many digits a 32-bit integer would store? However, for numeric types (i.e. [numeric] & [decimal]) the precision column is genuinely useful, as it tells you information that you need to know in order to determine how the data is defined, how it would be scripted and how the system will treat it. This you would not know without looking at the metadata.
scaleScale again is useful for numeric types. Exactly the same applies as for precision, you will see values for other types.
is_xml_documentThis is 1 if the data is an XML data type that is declared with the DOCUMENT keyword. This means that the data is a validated XML document, rather than a document fragment.
xml_collection_idThis is the ID of the xml schema collection that is used to validate the content of the data, if the data is validated with an xml schema.
A note on precision and scale - precision and scale are possibly the most misleading column names ever. You would think that precision would determine how precise a number was, and that scale would determine how big it could be. This leads to a common misconception that precision is the number of digits after the decimal point, and scale is the number of digits before the decimal point. It is actually the case that precision determines how many digits there are, and scale determines the number of digits to the right of the decimal point. In a way, the names do apply, but really I think total_digits and decimal_places would have been far more sensible in terms of column names.

So - what are the columns that are unique to each of [sys].[columns] and [sys].[parameters]? Let's have a look at [sys].[columns]:
object_idThis is the ID of the object to which the column belongs.
name - The name of the column.
column_idThis is the ID of the column. It usually starts at 1 and counts up, but if a column has been dropped from a table then you might find gaps in the sequence. New columns added later receive a higher column ID - gaps in the sequence are not re-filled.
collation_nameThis is the name of the collation that the column is declared with, if the column is a character data type. Note that this is a [sys].[columns] only column because collation does not, and cannot, apply to parameters.
is_nullableThis is 1 if the column accepts NULL values.
is_ansi_paddedThis is 1 if ANSI_PADDING was ON when the column was created, and the column is a character type, a binary type or a variant. If it is 0, then the column is either another type, or ANSI_PADDING was off.
is_rowguidcolThis is 1 if the column was created with the ROWGUIDCOL property.
is_identityThis is 1 if the column is an IDENTITY column - more information about IDENTITY columns can be found in [sys].[identity_columns].
is_computedThis is 1 if the column is a computed column - more information about computed columns can be found in [sys].[computed_columns].
is_filestreamThis is 1 if the column is a FILESTREAM column.
is_replicatedThis is 1 if the column is replicated.
is_non_sql_subscribedThis is 1 if the column has a subscriber that is not SQL Server based.
is_merge_publishedThis is 1 if the column is merge published.
is_dts_replicatedThis, despite it's name, is 1 if the column is replicated using SSIS.
default_object_idThis is the ID of either the default constraint object, or the stand-alone default that is bound to the column. Stand-alone defaults are a hang over from ages gone past, and should be avoided, as they will probably be deprecated soon.
rule_object_idThis is the ID of either the stand-alone rule that is bound to the column, if any. Like stand-alone defaults, stand-alone rules are a hang over, and are on the deprecation list. Interestingly, this column does not hold a value relating to a CHECK constraint on a column, that information comes from [sys].[check_constraints].
is_sparseThis is 1 if the column is a sparse column.
is_column_setThis is 1 if the column is a COLUMN_SET.

So - quite a lot of useful stuff there. So what does [sys].[parameters] have in store for us? Let's have a look:

object_idThis is the ID of the object to which the column belongs.
nameThe name of the column.
parameter_idThis is the ID of the parameter. Like [column_id] for [sys].[columns], this starts at 1 and counts up. It is always sequential, because the entire parameter collection is re-created when objects are re-created - so changing the parameters on a programmable object won't create gaps. There is a special case, however, and that is that 0 is used to relate to the return type of a scalar function.
is_outputThis is 1 if the parameter is an OUTPUT parameter or a return parameter.
is_cursor_refThis is 1 if the parameter is a cursor reference parameter.
has_default_valueYou might think that this would be 1 if the parameter has a default value. Nearly, it's 1 if the parameter has a default value AND it is a CLR object. That caveat makes the column mostly useless.
default_valueThe default value for the parameter, if [has_default_value] = 1.

So, a lot of information there then. It's not perfect, and there are some massive omissions, as well as things that MSDN clearly states incorrectly (for example, are scale and precision 0 for non-numeric types?). But, the system views available in SQL Server 2005+ give us a huge improvement over what was avialable previously, and give us a huge amount of information that we can use when writing system that work generically with SQL Server data, or even when writing Dynamic SQL.

Next time we look at the typing system, and look at user defined types and assembly types.

Tags: , ,

metadata

Object metadata

April 2, 2010 11:43 by Matt Whitfield

So what do those columns in [sys].[objects] mean? These columns are found in all the type-specific system views (as ‘inherited columns’), and contain the basic information that applies to any schema-scoped object. Let’s have a look at them:

nameThis is the object name. This does not include the schema part of the name, nor the name delimiter brackets []. Delimiter brackets can be really important if your database contains objects where the names relate to keywords, or contain non-standard characters. This was one of the trickier aspects to get right when I was writing my SQL Server IDE.
object_idThis is one of the most useful columns in metadata. Each schema scoped object is given an ID when it is created. This object ID will not be the same if you create an object in two separate databases, nor is it predictable. However, it is the key by which the object is identified throughout the system metadata tables, and stays the same if objects are renamed or altered through an DDL ALTER statement.
principal_idThis is the ID of the database principal that owns the object, if the owner of the object is not the owner of the schema to which the object belongs. What does this mean? Typically, the user who owns a schema will own the objects under that schema. However, the ALTER AUTHORIZATION statement can be used to specify a different owner. Object owners can do things such as change object permissions etc. If this column is NULL, then the object is owned by the owner of the schema to which it belongs. This links to the system view [sys].[database_principals].
schema_idThis is the ID of the schema to which the object belongs. This again, is an incredibly useful column, as it is needed to determine the fully-qualified name of any schema scoped object. This links to the system view [sys].[schemas].
parent_object_idThis is the object_id for the parent object for the current object. Remember those objects that were indirectly schema scoped from the first article (constraints and triggers)? Those objects will have a parent_object_id equivalent to the object_id of the table or view on which they were created.
typeThis is the type code for the object. The complete list of type codes is:

  • AF = Aggregate function (CLR) - no specific object view
  • C = CHECK constraint - more information in [sys].[check_constraints]
  • D = DEFAULT (constraint or stand-alone) - more information in [sys].[default_constraints], definition for old-style defaults in [sys].[sql_modules]
  • F = FOREIGN KEY constraint - more information in [sys].[foreign_keys] and [sys].[foreign_key_columns]
  • FN = SQL scalar function - definition found in [sys].[sql_modules]
  • FS = Assembly (CLR) scalar-function - more information found in [sys].[assembly_modules]
  • FT = Assembly (CLR) table-valued function - more information found in [sys].[assembly_modules]
  • IF = SQL inline table-valued function - definition found in [sys].[sql_modules]
  • IT = Internal table - internal table - more information in [sys].[internal_tables]
  • P = SQL Stored Procedure - definition found in [sys].[sql_modules]
  • PC = Assembly (CLR) stored-procedure - more information found in [sys].[assembly_modules]
  • PG = Plan guide - more information in [sys].[plan_guides].
  • PK = PRIMARY KEY constraint - more information in [sys].[key_constraints]
  • R = Rule (old-style, stand-alone) - definition found in [sys].[sql_modules]
  • RF = Replication-filter-procedure - definition found in [sys].[sql_modules]
  • S = System base table - no specific object view
  • SN = Synonym - defintion found in [sys].[synonyms]
  • SQ = Service queue - more information in [sys].[service_queues]
  • TA = Assembly (CLR) DML trigger - more information in [sys].[triggers] and [sys].[assembly_modules]
  • TF = SQL table-valued-function - definition found in [sys].[sql_modules]
  • TR = SQL DML trigger - more information in [sys].[triggers], definition found in [sys].[sql_modules]
  • TT = Table type - more information found in [sys].[table_types]
  • U = Table (user-defined) - more information found in [sys].[tables]
  • UQ = UNIQUE constraint - more information in [sys].[key_constraints]
  • V = View - more information in [sys].[views]
  • X = Extended stored procedure - more information in [sys].[extended_procedures]
type_descThis is simply a description of the type. If you are doing any querying on [sys].[objects], then do not use this column to match with - it's much more efficient to use the type column.
create_datethe date when the object was first created.
modify_dateThis is the date when the object was last altered using an ALTER statement. A special case here is that clustered index creation on tables and views also counts as modification.
is_ms_shippedThis is 1 if the object was created by a Microsoft install process. This bit can be important sometimes when differentiating user created objects from system created ones.
is_publishedThis is 1 if the object is a published article under replication.
is_schema_publishedThis is 1 if only the schema of the object is published, rather than any data.

Note that the SQL 2000 system table [sysobjects] was reasonably similar to this, but less clear. For example, the status column had different meanings for different objects. The new system views are in a better 'sql format' - i.e. they are much closer to something that a sql developer would be proud of.

Tags: ,

metadata

Objects and schema scoping

March 22, 2010 23:57 by Matt Whitfield

Most common objects in SQL Server are schema-scoped, that is they belong to a schema. In SQL Server 2000, schemas and database users were effectively the same thing, so objects could be called ‘user-scoped’. However, starting with SQL Server 2005 schemas and users (database principals) were separated. This means that schemas are now no longer tied directly to the users in the database.

So, schema scoped objects are those that belong directly to a schema. A schema can be thought of as a logical partition within a database which directly serves two purposes:

  1. To separate logical areas of the database such that users have a clear security constraints within your databases. understanding of what objects interact with which other objects. This can make your database a lot easier to understand for new people coming in and trying to understand your schemas. When you need to hire help, it’s probably because you are busy. Do you want to spend your time explaining your schema to your new help? Probably not so much.
  2. To enable security control to be applied to those logical groups to individual database users. This makes it both more convenient to apply permissions to logical groups of objects, and easier to understand the security picture in your databases.

Schema scoped objects can be found generally in the [sys].[objects] table. This table contains the following object types:

  • Aggregate function (CLR)
  • CHECK constraint
  • DEFAULT (constraint or stand-alone)
  • FOREIGN KEY constraint
  • SQL scalar function
  • Assembly (CLR) scalar-function
  • Assembly (CLR) table-valued function
  • SQL inline table-valued function
  • SQL Stored Procedure
  • Assembly (CLR) stored-procedure
  • PRIMARY KEY constraint
  • Rule (old-style, stand-alone)
  • Replication-filter-procedure
  • Synonym
  • Service queue
  • Assembly (CLR) DML trigger
  • SQL table-valued-function
  • SQL DML trigger
  • Table type
  • Table (user-defined)
  • UNIQUE constraint
  • View
  • Extended stored procedure

In this list, some of the object types are directly schema-scoped (i.e. they have a schema because you have chosen to place them in a schema for a reason) and some are indirectly schema-scoped (i.e. they have a schema because they are a child object of another object that belongs to a schema). While indirectly schema-scoped objects belong to their schema, they cannot have a schema different to that of their parent object. 

So, the directly schema scoped objects in [sys].[objects] are:

  • Aggregate function (CLR)
  • Assembly (CLR) scalar-function
  • SQL scalar function
  • Assembly (CLR) table-valued function
  • SQL table-valued-function
  • SQL inline table-valued function
  • Assembly (CLR) stored-procedure
  • SQL Stored Procedure
  • Default (old-style, stand-alone)
  • Rule (old-style, stand-alone)
  • Replication-filter-procedure
  • Synonym
  • Service queue
  • Table (user-defined)
  • View
  • Table type
  • Extended stored procedure

And the indirectly schema scoped objects in [sys].[objects] are:

  • CHECK constraint
  • DEFAULT constraint
  • FOREIGN KEY constraint
  • PRIMARY KEY constraint
  • Assembly (CLR) DML trigger
  • SQL DML trigger
  • UNIQUE constraint

However, there are other object types which are schema-scoped that are not present in [sys].[objects]. Hang on, doesn’t the MSDN description for [sys].[objects] say ‘Contains a row for each user-defined, schema-scoped object that is created within a database.’? Yes it does, but XML Schema Collections, user defined SQL types and user defined CLR types are all schema-scoped, yet do not appear in [sys].[objects]. Hey, that’s life (and also why I had a hard time writing my schema comparison tool). In the next article, I will explain what the columns in [sys].[objects] mean, and begin to explain the metadata for the various supplementary system views that provide extra information that is relevant to each different type.

Tags: ,

metadata

SQL Everywhere

Tag cloud

Calendar

<<  November 2014  >>
MoTuWeThFrSaSu
272829303112
3456789
10111213141516
17181920212223
24252627282930
1234567

View posts in large calendar