DBPedias

Your Database Knowledge Community

Michael Swart

  1. Lessons From Geordi La Forge

    I’ve been rewatching Star Trek again, I find I relate most to Geordi La Forge. He’s a technical guy. He wears the yellow shirt of engineering instead of the red shirt of command.

    So I was inspired to do write this light hearted post. I usually find non-technical posts a little wishy-washy (No offense to you Seth-Godin-types). But I liked the way this drawing turned out and I had to use it somehow right?

    Picture of Geordi Laforge captioned W W G L F D

    You may have already seen some text here introducing the comparison between Star Trek operations and company operations. Well here’s another one. For the moment we’ll sidestep the lessons learned from Picard (on how to be a inspiring leader) and go straight to Geordi. Here are the lessons:

    Make Everyone Believe You Can Do Anything …

    Here’s how Geordi’s typically responds to requests from the captain:

    Can you fix the neutrino flux generator? ==> either "I can fix that captain. Estimated time two minutes." or "Sorry captain, The flux couplers aren't in phase."

    Did you see what he did there? It’s not a choice between “I can” or “I can’t”. It’s also not a choice between “it’s possible” or “it’s impossible”. If Geordi can accomplish a task, he essentially takes the credit with “I can do this”. If he can’t accomplish the task, then his words imply “It can’t be done.” He never implies that his skills are lacking. So are Geordi’s skills ever lacking? No:

    … And Then Make Sure They’re Right …

    Even when the pressure’s on and someone calls your bluff. There’s a persistence that can pay off:
    RIKER: Gentlemen, we’re giving you an assignment. The one thing we don’t want to hear is that it’s impossible.
    PICARD: I need the transporters to function, despite the hyperonic radiation.
    LA FORGE: Yeah, but that’s imp… Yes, sir.

    and then later:
    LA FORGE: Captain – we can do it. We can modify the transporters. It’ll take fifteen years and a research team of a hundred…

    Most impossible things are actually just really really difficult. For example, recently I asked a question on stackexchange about avoiding downtime during a migration. The first feedback I got was the comment “I don’t think [the downtime] can be avoided” which got a couple up-votes. But it turns out there was a way; it just took a lot of effort (I blogged about that effort in this series). It’s the yellow shirt’s job to explain the “how” and “what” so that the red shirts can make informed choices.

    Later on in another episode:
    LA FORGE: Ferengi codes are damn near impossible to break.
    PICARD: Gentlemen, I have the utmost confidence in your ability to perform… the impossible.

    See! It paid off. But it’s easy for me to give advice that boils down to “be able to do anything” it’s another thing to make it so. How do you get great at what you do?

    … By Being Awesome

    Easier said than done right?

    So Geordi’s got skills, and he uses those skills to build a great career for himself. But where did he get those skills? I figure you can become great at something in a number of ways:

    • Be talented at it (you can’t change you, but you can change fields)
    • Hard work. There’s no getting by without hard work. Imagine you’re on vacation sitting on a beach. You’re looking at the hut next to you and see some guy in a chair reading SQL Server Pro on his iPad. You just can’t compete with that guy. I’m not suggesting you read trade magazines on holiday, but are you in a field where you might want to?

    It boils down to finding your passion. And I hope you’ve found yours. Geordi is a guy whose passionate about the latest thing in his field. It almost seems like he cares about it more than job security:

    PICARD: Warp without warp drive.
    RIKER: They’re gonna put you out of a job, Geordi.
    LA FORGE: I hope so, Commander.

    Antilessons From Geordi

    But I like my role models the same way I like a buffet. I pick and choose the good lessons and leave the rest behind. Here’s a Geordi “lesson” that didn’t make the cut:

    LA FORGE: I don’t know, Data, my gut tells me we ought to be listening to what this guy’s trying to tell us.
    DATA: Your gut?
    LA FORGE: It’s just a… a feeling, you know, an instinct. Intuition.
    DATA: But those qualities would interfere with rational judgment, would they not?
    LA FORGE: You’re right, sometimes they do.
    DATA: Then… why not rely strictly on the facts?
    LA FORGE: Because you just can’t rely on the plain and simple facts. Sometimes they lie.

    I don’t buy it. I get what he’s trying to say and the sentiment is correct, but the point of view is wrong. Facts don’t lie by definition. When Geordi says that facts lie sometimes, I would say instead that the facts are incomplete, or an assumption has been made incorrectly.

    Once I was asked by a developer why his code was throwing the error message:

    Table ‘noidentity’ does not have the identity property. Cannot perform SET operation.

    even though the table in question was showing that it did have an identity column.

    I said show me. And he did. And was it the same table? It was. And was it the same database? Yes. Are you sure? Of course. Can you show me the connection string? Hang on a second…. ohhhh. (Omitted in this mini-dialogue is  about 15 minutes of head-scratching on both our parts).

    Yes, sometimes the facts lie… but only when they’re not facts.

    More Geordi Lessons in The Comments

    I’ve put a few more of my favorites in the comment section of this post. Do you have any favorite Geordi lessons (or Star Trek lessons in general)?

  2. Modifying Tables Online – Part 5: Just One More Thing

    SERIES: Modifying Tables Online

    In the next few days I want to describe a way to modify the definition of very large tables while keeping the table available for other queries. Some table changes are already online operations. But offline changes need extra work to keep the table available. I explore one way to do that in this series.

    This article used to be a single blog post, but I broke it into four parts because it seemed too long. Read Part 1, and understand the strategy I’m using. Treat Parts 2-4 as appendixes. When you want to use this plan (or one like it), come back and use them as examples. You won’t be able to make use of the examples by cutting and pasting directly (unless you happen to be running an bike equipment store called Adventureworks), but modifications shouldn’t be too hard.

    Just One More Thing

    So Rob Volk commented on Part 3 of my blog. The comment prompted me to write a part five (which you’re reading now). So you can thank him for this bonus section.

    I’m going to quote Rob directly. He wrote:

    One question/suggestion, have you considered using a new schema (e.g. Sales_New) and creating the new table in that schema? The benefit is that all the defaults, keys, triggers, and indexes can retain the same names as the original, and the switch becomes a single ALTER SCHEMA…TRANSFER operation. (or two, one for the old table and one for the new)

    He’s absolutely right. The switch step becomes much much simpler and simpler is almost always better (as it is in this case). My migration would start by creating some extra schemas and then creating the staging table. In the Adventureworks example I’ve been using, that looks something like this:

    use AdventureWorks2012
    go
    create schema staging;
    go
    create schema obsolete;
    go
    CREATE TABLE staging.SalesOrderHeader(
    -- etc...

    So now our DB is prepared and looks like this:

    Copy the data over to the staging table the same way. In my SalesOrderHeader example, that code doesn’t change at all except that I replace “Sales.SalesOrderHeader_new” with “Staging.SalesOrderHeader”.

    When ready, the switch now includes code like this:

    ALTER SCHEMA obsolete TRANSFER Sales.SalesOrderHeader;
    ALTER SCHEMA Sales TRANSFER staging.SalesOrderHeader;
     
    drop trigger obsolete.t_i_SalesOrderHeader;
    drop trigger obsolete.t_u_SalesOrderHeader;
    drop trigger obsolete.t_d_SalesOrderHeader;

    Looks simple right! It is. Simple is better. The schema now looks something like this:

    A few things to be careful of:

    • Foreign keys: Although no renaming is necessary, they still have to dropped from and to the obsolete table. And foreign keys pointing to the new table still need to be added.
    • Existing triggers: They should recreated on the new table.

    That’s It

    So dear reader, that’s the series. It was fun for me to explore in depth an intermediate topic. Thanks for bearing with me. I know the series was a little dry, but I figured that I wanted to use this walk-through for myself as a template for future migration projects and that if I thought it was useful, maybe you would think so too.

  3. Modifying Tables Online – Part 4: Testing

    SERIES: Modifying Tables Online

    In the next few days I want to describe a way to modify the definition of very large tables while keeping the table available for other queries. Some table changes are already online operations. But offline changes need extra work to keep the table available. I explore one way to do that in this series.

    This article used to be a single blog post, but I broke it into four parts because it seemed too long. Read Part 1, and understand the strategy I’m using. Treat Parts 2-4 as appendixes. When you want to use this plan (or one like it), come back and use them as examples. You won’t be able to make use of the examples by cutting and pasting directly (unless you happen to be running an bike equipment store called Adventureworks), but modifications shouldn’t be too hard.

    This is part 4 of the series (you can consider this Appendix 3, The main article is found here).

    So my testing is also pretty straightforward. I

    1. Backup the Adventureworks2012 database on my dev machine
    2. Start sending a whack of queries to the database to simulate concurrent activity
    3. Make Adventureworks2012.Sales.SalesOrderHeader larger by adding lots of rows
    4. Migrate the database using the script from part 3
    5. Look for any errors, fix them, restore the db and repeat the process if necessary
    6. Otherwise … profit?

    You should be able to handle each of these steps. Except that I want to explain exactly how I personally handle number 2: I use an application to send queries to the database many times. Sounds easy right? The trick is that I want to call the stored procedure many times at once.

    Activity Generator

    This is a C# program which I compile into a .net application (I can’t wait for .net 4.5′s async and await)

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Data.SqlClient;
    using System.Threading;
     
    namespace ConsoleApplication1 {
        class Program {
            static int counter = 0;
            static void Main( string[] args ) {
                SqlConnectionStringBuilder cs = new SqlConnectionStringBuilder();
                cs.DataSource = @".";
                cs.InitialCatalog = "Adventureworks2012";
                cs.IntegratedSecurity = true;
                cs.AsynchronousProcessing = true;
                string connectionString = cs.ToString();
                string sql = @"s_DoSomething";
     
                for( int i = 0; i < 100000; i++ ) {
                    SqlConnection conn = new SqlConnection( connectionString );
                    conn.Open();
                    SqlCommand cmd = new SqlCommand( sql, conn );
                    cmd.CommandType = System.Data.CommandType.StoredProcedure;
                    cmd.BeginExecuteNonQuery( new AsyncCallback( EndExecution ), cmd );
                }
                Console.WriteLine( string.Format( @"Error count: {0}", counter ) );
                Console.ReadLine();
            }
     
            static void EndExecution( IAsyncResult c ) {
                SqlCommand endCmd = ( c.AsyncState as SqlCommand );
                try {
                    endCmd.EndExecuteNonQuery( c );
                } catch( Exception ex ) {
                    //counter++;
                    Interlocked.Increment( ref counter );
                    Console.WriteLine( ex.Message );
                } finally {
                    endCmd.Connection.Close();
                }
            }
        }
    }

    You might have noticed that this program is extremely boring. It’s just calling one stored procedure: s_DoSomething over and over as fast as it can. The advantage is that I’m free to modify s_DoSomething (which is for me a lot easier than modifying this application).

    So what does this procedure do? I’m glad you asked:

    s_DoSomething

    This procedure executes at random one task chosen from a set of typical OLTP tasks that use the table Sales.SalesOrderHeader. Here’s the one I used:

    create procedure s_DoSomething 
    as
    begin
     
        declare @choice int = 1 + RAND() * 6
        declare @SalesOrderId int;
     
        if (@choice = 1)
            insert Sales.SalesOrderHeader(DueDate, OrderDate, CustomerID, BillToAddressID, ShipToAddressID, ShipMethodID)
            select top 1000 DueDate, DATEADD(day, -1, DueDate), CustomerID, BillToAddressID, ShipToAddressID, ShipMethodID
            from Sales.SalesOrderHeader
        else if (@choice = 2)
            delete top (100) Sales.SalesOrderHeader
        else if (@choice = 3)
            begin
                declare @personid int;
     
                select top 1 @personid = BusinessEntityID
                from Sales.SalesPerson
                order by newid();
     
                select * 
                from sales.SalesOrderHeader
                where SalesPersonID = @personid;
            end;
        else if (@choice IN (4,5,6))
            begin
                select top 1 @SalesOrderId = SalesOrderId
                from Sales.SalesOrderHeader with (nolock);
     
                select top 100 * 
                from Sales.SalesOrderHeader 
                where SalesOrderId > @SalesOrderId
            end
     
    end
  4. Modifying Tables Online – Part 3: Example With Error Handling

    SERIES: Modifying Tables Online

    In the next few days I want to describe a way to modify the definition of very large tables while keeping the table available for other queries. Some table changes are already online operations. But offline changes need extra work to keep the table available. I explore one way to do that in this series.

    This article used to be a single blog post, but I broke it into four parts because it seemed too long. Read Part 1, and understand the strategy I’m using. Treat Parts 2-4 as appendixes. When you want to use this plan (or one like it), come back and use them as examples. You won’t be able to make use of the examples by cutting and pasting directly (unless you happen to be running an bike equipment store called Adventureworks), but modifications shouldn’t be too hard.

    This is part 3 of the series (you can consider this Appendix 2, The main article is found here).

    What you find below is exactly the same script as you find in Part 2: Implementation Example with the exception that I’ve included error handling and wrappers to handle the following goals:

    • If a statement fails, no data is lost
    • If a statement fails, other queries are not impacted (i.e. we’re online)
    • If a statement fails anywhere, it can be retried from the beginning with no lost work. This means for example that when data is copied to the staging table in batches, and that process is interrupted and restarted, the script “picks up where it left off”.
    • The process is re-runnable. This means that if the script succeeds and for some reason the script is run again then no errors are raised and no work is performed.
    • The last step, the switch, is an atomic transaction. It succeeds 100% or fails and rolls back 100%.

    I present without any further comment, the scripts (with error-handling).

    Create Staging Table

    use AdventureWorks2012
     
    -- original table isn't migrated, staging table doesn't exist
    IF NOT EXISTS 
        (
            SELECT 1 
            FROM INFORMATION_SCHEMA.TABLES 
            WHERE TABLE_SCHEMA = 'Sales' 
            AND TABLE_NAME = 'SalesOrderHeader_new'
        )
    AND NOT EXISTS 
        (
            SELECT 1 
            FROM INFORMATION_SCHEMA.COLUMNS 
            WHERE TABLE_SCHEMA = 'Sales' 
            AND TABLE_NAME = 'SalesOrderHeader' 
            AND COLUMN_NAME = 'Rowversion'
        )
    BEGIN
     
        CREATE TABLE Sales.SalesOrderHeader_new(
            SalesOrderID int IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
            RevisionNumber tinyint NOT NULL,
            OrderDate datetime NOT NULL,
            DueDate datetime NOT NULL,
            ShipDate datetime NULL,
            [Status] tinyint NOT NULL,
            OnlineOrderFlag dbo.Flag NOT NULL,
            SalesOrderNumber  AS (isnull(N'SO'+CONVERT(nvarchar(23),SalesOrderID),N'*** ERROR ***')),
            PurchaseOrderNumber dbo.OrderNumber NULL,
            AccountNumber dbo.AccountNumber NULL,
            CustomerID int NOT NULL,
            SalesPersonID int NULL,
            TerritoryID int NULL,
            BillToAddressID int NOT NULL,
            ShipToAddressID int NOT NULL,
            ShipMethodID int NOT NULL,
            CreditCardID int NULL,
            CreditCardApprovalCode varchar(15) NULL,
            CurrencyRateID int NULL,
            SubTotal money NOT NULL,
            TaxAmt money NOT NULL,
            Freight money NOT NULL,
            TotalDue  AS (isnull((SubTotal+TaxAmt)+Freight,(0))),
            Comment nvarchar(128) NULL,
            rowguid uniqueidentifier ROWGUIDCOL  NOT NULL,
            ModifiedDate datetime NOT NULL,
            [Rowversion] rowversion NOT NULL, -- This is the new column!!
            constraint PK_SalesOrderHeader_new_SalesOrderID
                primary key clustered (SalesOrderID ASC),
            constraint FK_SalesOrderHeader_new_Address_BillToAddressID 
                foreign key(BillToAddressID) references Person.[Address] (AddressID),
            constraint FK_SalesOrderHeader_new_Address_ShipToAddressID 
                foreign key(ShipToAddressID) references Person.[Address] (AddressID),
            constraint FK_SalesOrderHeader_new_CreditCard_CreditCardID 
                foreign key(CreditCardID) references Sales.CreditCard (CreditCardID),
            constraint FK_SalesOrderHeader_new_CurrencyRate_CurrencyRateID 
                foreign key(CurrencyRateID) references Sales.CurrencyRate (CurrencyRateID),
            constraint FK_SalesOrderHeader_new_Customer_CustomerID 
                foreign key(CustomerID) references Sales.Customer (CustomerID),
            constraint FK_SalesOrderHeader_new_SalesPerson_SalesPersonID 
                foreign key(SalesPersonID) references Sales.SalesPerson (BusinessEntityID),
            constraint FK_SalesOrderHeader_new_SalesTerritory_TerritoryID 
                foreign key(TerritoryID) references Sales.SalesTerritory (TerritoryID),
            constraint FK_SalesOrderHeader_new_ShipMethod_ShipMethodID 
                foreign key(ShipMethodID) references Purchasing.ShipMethod (ShipMethodID),
            constraint CK_SalesOrderHeader_new_DueDate 
                check (DueDate>=OrderDate),
            constraint CK_SalesOrderHeader_new_Freight 
                check (Freight>=0.00),
            constraint CK_SalesOrderHeader_new_ShipDate 
                check (ShipDate>=OrderDate or ShipDate is null),
            constraint CK_SalesOrderHeader_new_Status 
                check ([Status]>=0 and [Status]<=8),
            constraint CK_SalesOrderHeader_new_SubTotal 
                check (SubTotal>=0.00),
            constraint CK_SalesOrderHeader_new_TaxAmt 
                check (TaxAmt>=0.00)
        )
     
        -- add default constraints
        alter table Sales.SalesOrderHeader_new add
        constraint DF_SalesOrderHeader_new_RevisionNumber  
            default 0 for RevisionNumber,
        constraint DF_SalesOrderHeader_new_OrderDate  
            default getdate() for OrderDate,
        constraint DF_SalesOrderHeader_new_Status  
            default 1 for [Status],
        constraint DF_SalesOrderHeader_new_OnlineOrderFlag  
            default 1 for OnlineOrderFlag,
        constraint DF_SalesOrderHeader_new_SubTotal  
            default 0.00 for SubTotal,
        constraint DF_SalesOrderHeader_new_TaxAmt  
            default 0.00 for TaxAmt,
        constraint DF_SalesOrderHeader_new_Freight  
            default 0.00 for Freight,
        constraint DF_SalesOrderHeader_new_rowguid  
            default newid() for rowguid,
        constraint DF_SalesOrderHeader_new_ModifiedDate  
            default getdate() for ModifiedDate
     
            -- two nc ak
        create unique nonclustered index AK_SalesOrderHeader_new_rowguid 
            on Sales.SalesOrderHeader_new(rowguid ASC)
     
        create unique nonclustered index AK_SalesOrderHeader_new_SalesOrderNumber
            on Sales.SalesOrderHeader_new(SalesOrderNumber)
     
            -- two nc ix
        create nonclustered index IX_SalesOrderHeader_new_CustomerID
            on Sales.SalesOrderHeader_new(CustomerID)
     
        create nonclustered index IX_SalesOrderHeader_new_SalesPersonID
            on Sales.SalesOrderHeader_new(SalesPersonID)
    END
    GO

    Add Triggers

    -- staging table exists
    -- original table isn't migrated, 
    -- triggers don't exist
    IF EXISTS 
        (
            SELECT 1 
            FROM INFORMATION_SCHEMA.TABLES 
            WHERE TABLE_SCHEMA = 'Sales' 
            AND TABLE_NAME = 'SalesOrderHeader_new'
        )
    AND NOT EXISTS 
        (
            SELECT 1 
            FROM INFORMATION_SCHEMA.COLUMNS 
            WHERE TABLE_SCHEMA = 'Sales' 
            AND TABLE_NAME = 'SalesOrderHeader' 
            AND COLUMN_NAME = 'Rowversion'
        )
    AND NOT EXISTS
        (
            SELECT 1 
            FROM sys.triggers
            WHERE name = 't_i_SalesOrderHeader'
        )
    BEGIN
     
        EXEC sp_executesql N'
            create trigger t_i_SalesOrderHeader
            on Sales.SalesOrderHeader
            after insert
            as 
                set identity_insert Sales.SalesOrderHeader_new on;
     
                insert SalesOrderHeader_new(SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status], OnlineOrderFlag, PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID, CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt, Freight, Comment, rowguid, ModifiedDate)
                select SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status], OnlineOrderFlag, PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID, CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt, Freight, Comment, rowguid, ModifiedDate
                from inserted
     
                set identity_insert Sales.SalesOrderHeader_new off;';
    END
    GO
     
    IF EXISTS 
        (
            SELECT 1 
            FROM INFORMATION_SCHEMA.TABLES 
            WHERE TABLE_SCHEMA = 'Sales' 
            AND TABLE_NAME = 'SalesOrderHeader_new'
        )
    AND NOT EXISTS 
        (
            SELECT 1 
            FROM INFORMATION_SCHEMA.COLUMNS 
            WHERE TABLE_SCHEMA = 'Sales' 
            AND TABLE_NAME = 'SalesOrderHeader' 
            AND COLUMN_NAME = 'Rowversion'
        )
    AND NOT EXISTS
        (
            SELECT 1 
            FROM sys.triggers
            WHERE name = 't_u_SalesOrderHeader'
        )
    BEGIN
        EXEC sp_executesql N'
            create trigger t_u_SalesOrderHeader
            on Sales.SalesOrderHeader
            after update
            as 
                -- assuming pk values are never altered.
                update Sales.SalesOrderHeader_new
                set 
                    RevisionNumber = i.RevisionNumber,
                    OrderDate  = i.OrderDate,
                    DueDate  = i.DueDate,
                    ShipDate  = i.ShipDate,
                    [Status]  = i.[Status],
                    OnlineOrderFlag  = i.OnlineOrderFlag,
                    PurchaseOrderNumber  = i.PurchaseOrderNumber,
                    AccountNumber  = i.AccountNumber,
                    CustomerID  = i.CustomerID,
                    SalesPersonID  = i.SalesPersonID,
                    TerritoryID  = i.TerritoryID,
                    BillToAddressID  = i.BillToAddressID,
                    ShipToAddressID  = i.ShipToAddressID,
                    ShipMethodID  = i.ShipMethodID,
                    CreditCardID  = i.CreditCardID,
                    CreditCardApprovalCode = i.CreditCardApprovalCode,
                    CurrencyRateID  = i.CurrencyRateID,
                    SubTotal  = i.SubTotal,
                    TaxAmt  = i.TaxAmt,
                    Freight  = i.Freight,
                    Comment  = i.Comment,
                    rowguid  = i.rowguid,
                    ModifiedDate = i.ModifiedDate
                from Sales.SalesOrderHeader_new s
                join inserted i 
                    on s.SalesOrderID = i.SalesOrderID;';
    END
    GO
     
    IF EXISTS 
        (
            SELECT 1 
            FROM INFORMATION_SCHEMA.TABLES 
            WHERE TABLE_SCHEMA = 'Sales' 
            AND TABLE_NAME = 'SalesOrderHeader_new'
        )
    AND NOT EXISTS 
        (
            SELECT 1 
            FROM INFORMATION_SCHEMA.COLUMNS 
            WHERE TABLE_SCHEMA = 'Sales' 
            AND TABLE_NAME = 'SalesOrderHeader' 
            AND COLUMN_NAME = 'Rowversion'
        )
    AND NOT EXISTS
        (
            SELECT 1 
            FROM sys.triggers
            WHERE name = 't_d_SalesOrderHeader'
        )
    BEGIN
        EXEC sp_executesql N'
            create trigger t_d_SalesOrderHeader
            on Sales.SalesOrderHeader
            after delete
            as 
                delete Sales.SalesOrderHeader_new
                from Sales.SalesOrderHeader_new s
                join deleted d 
                    on d.SalesOrderID = s.SalesOrderID ;';
    END
    GO

    Move the data over in batches

    set nocount on;
    set transaction isolation level serializable;
    set deadlock_priority LOW
     
    declare @i int;
    declare @new_i int = 0;
    declare @rowcount int = -1;
    declare @minBatchSize int = 5000
     
    set identity_insert Sales.SalesOrderHeader_new on;
     
    select @i = isnull(max(SalesOrderID), 0)
    from Sales.SalesOrderHeader_new
     
    -- batch loop
    while @rowcount <> 0
    begin
     
        -- retry loop:
        declare @counter int = 1;
        declare @error int = 0;
        while (1=1)
        begin
            begin try
                select top (@minBatchSize) @new_i = SalesOrderID
                from Sales.SalesOrderHeader
                where SalesOrderID > @i
                order by SalesOrderID asc;
     
                merge Sales.SalesOrderHeader_new as target
                using (
                        select SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status], OnlineOrderFlag, PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID, CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt, Freight, Comment, rowguid, ModifiedDate
                        from Sales.SalesOrderHeader
                        where SalesOrderID > @i
                        and SalesOrderID <= @new_i
                    ) as source (SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status], OnlineOrderFlag, PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID, CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt, Freight, Comment, rowguid, ModifiedDate)
                on target.SalesOrderID = source.SalesOrderID
                when not matched then
                    insert (SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status], OnlineOrderFlag, PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID, CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt, Freight, Comment, rowguid, ModifiedDate)
                    values (source.SalesOrderID, source.RevisionNumber, source.OrderDate, source.DueDate, source.ShipDate, source.[Status], source.OnlineOrderFlag, source.PurchaseOrderNumber, source.AccountNumber, source.CustomerID, source.SalesPersonID, source.TerritoryID, source.BillToAddressID, source.ShipToAddressID, source.ShipMethodID, source.CreditCardID, source.CreditCardApprovalCode, source.CurrencyRateID, source.SubTotal, source.TaxAmt, source.Freight, source.Comment, source.rowguid, source.ModifiedDate);
     
                --success            
                select @rowcount = @@ROWCOUNT, @i = @new_i;
                break;
            end try
            begin catch
                select @counter    = @counter + 1,
                    @error = @@ERROR;
     
                if (@counter >= 5)
                begin
                    -- this batch failed five times! Something's wrong
                    raiserror (N'Could not copy Sales.SalesOrderHeader rows to Sales.SalesOrderHeader_new',
                        16, 1, @error);
                    break;
                end
            end catch
        end 
     
        if @error <> 0
        begin
            -- an error has been raised. Discontinue.
            break;
        end
     
    end
     
    set identity_insert Sales.SalesOrderHeader_new off;
    go

    Do The Switch

    set xact_abort on
    set transaction isolation level serializable
    begin transaction
    IF EXISTS 
        (
            SELECT 1 
            FROM INFORMATION_SCHEMA.TABLES 
            WHERE TABLE_SCHEMA = 'Sales' 
            AND TABLE_NAME = 'SalesOrderHeader_new'
        )
    BEGIN
        begin try
            -- rename pks and indexes from original table
            exec sp_rename N'Sales.PK_SalesOrderHeader_SalesOrderID', N'PK_SalesOrderHeader_old_SalesOrderID', N'OBJECT';
            exec sp_rename N'Sales.SalesOrderHeader.AK_SalesOrderHeader_rowguid', N'AK_SalesOrderHeader_old_rowguid', N'INDEX'
            exec sp_rename N'Sales.SalesOrderHeader.AK_SalesOrderHeader_SalesOrderNumber', N'AK_SalesOrderHeader_old_SalesOrderNumber', N'INDEX'
            exec sp_rename N'Sales.SalesOrderHeader.IX_SalesOrderHeader_CustomerID', N'IX_SalesOrderHeader_old_CustomerID', N'INDEX'
            exec sp_rename N'Sales.SalesOrderHeader.IX_SalesOrderHeader_SalesPersonID', N'IX_SalesOrderHeader_old_SalesPersonID', N'INDEX'
     
            -- rename checks and defaults
            exec sp_rename N'Sales.CK_SalesOrderHeader_Status', 'CK_SalesOrderHeader_old_Status', N'OBJECT';
            exec sp_rename N'Sales.CK_SalesOrderHeader_DueDate', 'CK_SalesOrderHeader_old_DueDate', N'OBJECT';
            exec sp_rename N'Sales.CK_SalesOrderHeader_ShipDate', 'CK_SalesOrderHeader_old_ShipDate', N'OBJECT';
            exec sp_rename N'Sales.CK_SalesOrderHeader_SubTotal', 'CK_SalesOrderHeader_old_SubTotal', N'OBJECT';
            exec sp_rename N'Sales.CK_SalesOrderHeader_TaxAmt', 'CK_SalesOrderHeader_old_TaxAmt', N'OBJECT';
            exec sp_rename N'Sales.CK_SalesOrderHeader_Freight', 'CK_SalesOrderHeader_old_Freight', N'OBJECT';
            exec sp_rename N'Sales.DF_SalesOrderHeader_RevisionNumber', 'DF_SalesOrderHeader_old_RevisionNumber', N'OBJECT';
            exec sp_rename N'Sales.DF_SalesOrderHeader_OrderDate', 'DF_SalesOrderHeader_old_OrderDate', N'OBJECT';
            exec sp_rename N'Sales.DF_SalesOrderHeader_Status', 'DF_SalesOrderHeader_old_Status', N'OBJECT';
            exec sp_rename N'Sales.DF_SalesOrderHeader_OnlineOrderFlag', 'DF_SalesOrderHeader_old_OnlineOrderFlag', N'OBJECT';
            exec sp_rename N'Sales.DF_SalesOrderHeader_SubTotal', 'DF_SalesOrderHeader_old_SubTotal', N'OBJECT';
            exec sp_rename N'Sales.DF_SalesOrderHeader_TaxAmt', 'DF_SalesOrderHeader_old_TaxAmt', N'OBJECT';
            exec sp_rename N'Sales.DF_SalesOrderHeader_Freight', 'DF_SalesOrderHeader_old_Freight', N'OBJECT';
            exec sp_rename N'Sales.DF_SalesOrderHeader_rowguid', 'DF_SalesOrderHeader_old_rowguid', N'OBJECT';
            exec sp_rename N'Sales.DF_SalesOrderHeader_ModifiedDate', 'DF_SalesOrderHeader_old_ModifiedDate', N'OBJECT';
     
            -- drop fks from original table
            alter table Sales.SalesOrderHeader
                drop constraint 
                FK_SalesOrderHeader_Address_BillToAddressID,
                FK_SalesOrderHeader_Address_ShipToAddressID,
                FK_SalesOrderHeader_CreditCard_CreditCardID,
                FK_SalesOrderHeader_CurrencyRate_CurrencyRateID,
                FK_SalesOrderHeader_Customer_CustomerID,
                FK_SalesOrderHeader_SalesPerson_SalesPersonID,
                FK_SalesOrderHeader_ShipMethod_ShipMethodID,
                FK_SalesOrderHeader_SalesTerritory_TerritoryID
     
            -- drop fks to original table
            alter table Sales.SalesOrderDetail
                drop constraint FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID;
     
            alter table Sales.SalesOrderHeaderSalesReason
                drop constraint FK_SalesOrderHeaderSalesReason_SalesOrderHeader_SalesOrderID;
     
            -- drop triggers on original table
            drop trigger Sales.t_i_SalesOrderHeader;
            drop trigger Sales.t_u_SalesOrderHeader;
            drop trigger Sales.t_d_SalesOrderHeader;
     
            -- drop uSalesOrderHeader (but remember its definition)
            declare @triggersql nvarchar(max);
            select @triggersql = OBJECT_DEFINITION(object_id) from sys.triggers where name = 'uSalesOrderHeader';
            drop trigger Sales.uSalesOrderHeader; 
     
            -- rename original table to "old"
            exec sp_rename N'Sales.SalesOrderHeader', N'SalesOrderHeader_old', N'OBJECT';
     
            --rename the new indexes
            exec sp_rename N'Sales.PK_SalesOrderHeader_new_SalesOrderID', N'PK_SalesOrderHeader_SalesOrderID', N'OBJECT';
            exec sp_rename N'Sales.SalesOrderHeader_new.AK_SalesOrderHeader_new_rowguid', N'AK_SalesOrderHeader_rowguid', N'INDEX'
            exec sp_rename N'Sales.SalesOrderHeader_new.AK_SalesOrderHeader_new_SalesOrderNumber', N'AK_SalesOrderHeader_SalesOrderNumber', N'INDEX'
            exec sp_rename N'Sales.SalesOrderHeader_new.IX_SalesOrderHeader_new_CustomerID', N'IX_SalesOrderHeader_CustomerID', N'INDEX'
            exec sp_rename N'Sales.SalesOrderHeader_new.IX_SalesOrderHeader_new_SalesPersonID', N'IX_SalesOrderHeader_SalesPersonID', N'INDEX'
     
            -- rename the new checks and defaults
            exec sp_rename N'Sales.CK_SalesOrderHeader_new_Status', 'CK_SalesOrderHeader_Status', N'OBJECT';
            exec sp_rename N'Sales.CK_SalesOrderHeader_new_DueDate', 'CK_SalesOrderHeader_DueDate', N'OBJECT';
            exec sp_rename N'Sales.CK_SalesOrderHeader_new_ShipDate', 'CK_SalesOrderHeader_ShipDate', N'OBJECT';
            exec sp_rename N'Sales.CK_SalesOrderHeader_new_SubTotal', 'CK_SalesOrderHeader_SubTotal', N'OBJECT';
            exec sp_rename N'Sales.CK_SalesOrderHeader_new_TaxAmt', 'CK_SalesOrderHeader_TaxAmt', N'OBJECT';
            exec sp_rename N'Sales.CK_SalesOrderHeader_new_Freight', 'CK_SalesOrderHeader_Freight', N'OBJECT';
            exec sp_rename N'Sales.DF_SalesOrderHeader_new_RevisionNumber', 'DF_SalesOrderHeader_RevisionNumber', N'OBJECT';
            exec sp_rename N'Sales.DF_SalesOrderHeader_new_OrderDate', 'DF_SalesOrderHeader_OrderDate', N'OBJECT';
            exec sp_rename N'Sales.DF_SalesOrderHeader_new_Status', 'DF_SalesOrderHeader_Status', N'OBJECT';
            exec sp_rename N'Sales.DF_SalesOrderHeader_new_OnlineOrderFlag', 'DF_SalesOrderHeader_OnlineOrderFlag', N'OBJECT';
            exec sp_rename N'Sales.DF_SalesOrderHeader_new_SubTotal', 'DF_SalesOrderHeader_SubTotal', N'OBJECT';
            exec sp_rename N'Sales.DF_SalesOrderHeader_new_TaxAmt', 'DF_SalesOrderHeader_TaxAmt', N'OBJECT';
            exec sp_rename N'Sales.DF_SalesOrderHeader_new_Freight', 'DF_SalesOrderHeader_Freight', N'OBJECT';
            exec sp_rename N'Sales.DF_SalesOrderHeader_new_rowguid', 'DF_SalesOrderHeader_rowguid', N'OBJECT';
            exec sp_rename N'Sales.DF_SalesOrderHeader_new_ModifiedDate', 'DF_SalesOrderHeader_ModifiedDate', N'OBJECT';
     
            -- rename "new" table
            exec sp_rename N'Sales.SalesOrderHeader_new', N'SalesOrderHeader', N'OBJECT';
     
            --rename fks pointing out
            exec sp_rename N'Sales.FK_SalesOrderHeader_new_Address_BillToAddressID', N'FK_SalesOrderHeader_Address_BillToAddressID', N'OBJECT'
            exec sp_rename N'Sales.FK_SalesOrderHeader_new_Address_ShipToAddressID', N'FK_SalesOrderHeader_Address_ShipToAddressID', N'OBJECT' 
            exec sp_rename N'Sales.FK_SalesOrderHeader_new_CreditCard_CreditCardID', N'FK_SalesOrderHeader_CreditCard_CreditCardID', N'OBJECT'
            exec sp_rename N'Sales.FK_SalesOrderHeader_new_CurrencyRate_CurrencyRateID', N'FK_SalesOrderHeader_CurrencyRate_CurrencyRateID', N'OBJECT'
            exec sp_rename N'Sales.FK_SalesOrderHeader_new_Customer_CustomerID', N'FK_SalesOrderHeader_Customer_CustomerID', N'OBJECT'
            exec sp_rename N'Sales.FK_SalesOrderHeader_new_SalesPerson_SalesPersonID', N'FK_SalesOrderHeader_SalesPerson_SalesPersonID', N'OBJECT'
            exec sp_rename N'Sales.FK_SalesOrderHeader_new_ShipMethod_ShipMethodID', N'FK_SalesOrderHeader_ShipMethod_ShipMethodID', N'OBJECT'
            exec sp_rename N'Sales.FK_SalesOrderHeader_new_SalesTerritory_TerritoryID', N'FK_SalesOrderHeader_SalesTerritory_TerritoryID', N'OBJECT'
     
            -- re-add fks pointing in (without checks for now)
            alter table Sales.SalesOrderDetail with nocheck add
            constraint FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID foreign key (SalesOrderId)
                references Sales.SalesOrderHeader(SalesOrderId) 
     
            alter table Sales.SalesOrderHeaderSalesReason with nocheck add
            constraint FK_SalesOrderHeaderSalesReason_SalesOrderHeader_SalesOrderID foreign key (SalesOrderId)
                references Sales.SalesOrderHeader(SalesOrderId)
     
            -- re-add trigger (use sp_executesql in order to keep in transaction)
            exec sp_executesql @triggersql;
        end try
     
        begin catch
            declare @error nvarchar(max) = ERROR_MESSAGE();
            select 'An error was encountered... Rolling back' as [message],
                @error as errorMessage
            raiserror (@error, 16, 1)
            rollback
        end catch
    END
    if @@TRANCOUNT > 0
        commit
    GO
  5. Modifying Tables Online – Part 1: Migration Strategy

    SERIES: Modifying Tables Online

    In the next few days I want to describe a way to modify the definition of very large tables while keeping the table available for other queries. Some table changes are already online operations. But offline changes need extra work to keep the table available. I explore one way to do that in this series.

    This article used to be a single blog post, but I broke it into four parts because it seemed too long. Read Part 1, and understand the strategy I’m using. Treat Parts 2-4 as appendixes. When you want to use this plan (or one like it), come back and use them as examples. You won’t be able to make use of the examples by cutting and pasting directly (unless you happen to be running an bike equipment store called Adventureworks), but modifications shouldn’t be too hard.

    • Part 1: Migration Strategy
    • Part 2: Implementation Example
    • Part 3: Same Example With Error Handling
    • Part 4: Testing

    Database Schema Changes

    Database schema changes are inevitable. Some of these changes can be performed quickly (like altering ntext columns to nvarchar columns). Changes like these don’t have to process every single row in the table and we say that these quick changes are online.
    Of course that means there are also such things as offline changes. These are changes that need to lock the table and process each row (like adding a persisted computed column to a table). Offline changes are sometimes okay when:

    • You have scheduled maintenance window to do work
    • The table isn’t that large or
    • the estimated amount of downtime is tolerable

    But sometimes the estimated downtime is not tolerable and you want your changes to be online. Microsoft has improved and continue to improve SQL Server’s online features. For example:

    There are a few operations that are still not online. Here’s an example of one of them:

    ALTER TABLE [Tablename]
    ADD Rowversion [Rowversion] NOT NULL

    The downtime for this table change can last seconds or minutes depending on the size of the table. If your business can’t tolerate that downtime, what options are there? I’m going to explain one strategy which helps me with these following goals:

    • I want to add a rowversion column to a table. (But the strategy applies to many other offline changes).
    • The straightforward alter statement would lock the table for longer than we can tolerate
    • My change can place an exclusive lock on the table, but only for a short amount of time. This means that the change should work concurrently with quick transactions, but not necessarily with large bulk transactions.
    • It would be an advantage if this were a database-only solution because while I might understand the applications and processes that query my database, I don’t necessarily have as much control over when or how these queries are sent.

    So here’s something that might work:

    The Strategy

    The strategy is fairly straightforward:

    A) I create a staging table which will store the same data as the original table but with the new rowversion column. The staging table is empty at first:

    B) Then I add triggers to the original table to keep the staging table up to date. Any changes in the original table will also be applied to matching data in the staging table.

    C) Fill the staging table a little at a time. This processing may take time, but the original table is still available to others.

    D) Finally remove triggers and rename the tables:

    Some Notes

    Dependencies I didn’t mention it yet, but we also have to worry about table dependencies. Tables can have a lot of dependencies (triggers, constraints, indexes etc…). You can create dependencies for the staging table, and rename them as well during the switch.

    Step D, the Switch: The last step is the tricky one. It’s the one that takes a schema modification (Sch-M) lock on the tables. A Sch-M lock on the table is incompatible with every other kind of lock. So that means that no in-flight transactions are allowed to use the table during the switch. SQL Server handles it pretty well, but if there are long running queries on this table, there will be blocking.

    What’s next?

    • In part 2, I implement this strategy for Adventureworks’ Sales.SalesOrderHeader table which seems to demonstrate every kind of table dependency SQL Server allows.
    • In part 3, I throw in error handling code to make the script more robust. So that if things go wrong, the application doesn’t notice and I don’t lose data.
    • In part 4, I explain the testing I do. It gives me more confidence that I don’t miss an update or a delete somehow.

    Thanks

    I posted a question on dba.stackexchange.com to look for some strategies. The answer I got there (thanks Brent!) is slightly different than the solution I’m describing here. Brent described a solution which used a view that UNION’s the staging table and the original table. It uses less space than the solution I describe, but it’s got some extra gotchas.

    Some of this work was done in the course of my regular job. So thanks to Desire2Learn for letting me share some of the lessons I learned there. By the way, we’re hiring.

  6. Sending Query Results to Others

    I think most of you have all been here before. You want to know the results of a query, but you have to get it from a colleague through email or some other proxy:

    Jerry Maguire is asked to "Show Me The Data"

    So how do you usually do it? How do you ask someone to send you the results of a query? There’s a million ways to answer that question and I’ve discovered that none of them are perfect. Depending on the situation, different ways are better than others. So let’s pick a situation and explore the best way to ask for those query results.

    The Situation

    The facts were these…

    • Goal: I want the results of the following query. It’s retrieves the top 20 cached queries and plans based on total consumed cpu.
    • I have to rely on my friend to run the query for me.
    • My friend is not a database administrator but is comfortable around SQL Server Management Studio
    • I want to minimize or simplify instructions.
    • I don’t want any truncated results.
    • I want to assure my friend that these queries are safe.
    • For this situation, I’m not worried how my friend gets the data to me (email, ftp, etc…). I only care that my friend is able to persist the data to some file.

    And to complete the situation, here’s that query I’m curious about…

    -- TOP 20 queries (by CPU)
    SELECT TOP (20)
        [TotalCPU] = total_worker_time,
        [TotalElapsedTime] = total_elapsed_time,
        [ExecutionCount] = execution_count,
        [AverageCPUInMicroseconds] = cast(total_worker_time / (execution_count + 0.0) as money),
        [DBName] = DB_NAME(ST.dbid),
        [ObjectName] = OBJECT_NAME(ST.objectid, ST.dbid),
        [QueryText] = (SELECT [processing-instruction(q)] = CASE
                WHEN [sql_handle] IS NULL THEN ' '
                ELSE (SUBSTRING(ST.TEXT,(QS.statement_start_offset + 2) / 2,
                    (CASE
                            WHEN QS.statement_end_offset = -1 THEN LEN(CONVERT(NVARCHAR(MAX),ST.text)) * 2
                            ELSE QS.statement_end_offset
                            END - QS.statement_start_offset) / 2))
                END
    			FOR XML PATH(''), type),
        [QueryPlan] = qp.query_plan
    FROM sys.dm_exec_query_stats QS
    CROSS APPLY sys.dm_exec_sql_text([sql_handle]) ST
    CROSS APPLY sys.dm_exec_query_plan ([plan_handle]) QP
    ORDER BY total_worker_time DESC

    Notice that the results of this query contains xml, and that xml might contain commas, spaces, tabs, newlines and all the typical delimiters. That makes my job trickier. So now I’m going to list a number of methods and their pros and cons. I can think of a number of methods off the top of my head.

    Method 1: Cut and Paste into Excel

    This is most people’s usual method and it works in 90% of all cases because Excel is so versatile. However in this particular case, the whitespace in the query plans and the sql text mess up some of the rows and formatting, one field in each row. I get something like this, ugh:

    A jumbled Excel spreadsheet

    But there’s a better way using Excel. It’s not too hard to use the Data Connection Wizard. So it’s not too much work to guide my friend through it either. If I can, then this is a really useful nice way to pass around data. That wizard is accessed like this:

    Using Excel to query the data

    Method 2: Store Results As Tables in a Temporary DB and Back It Up

    That can be done with a single T-SQL script which my friend should be able to run no problem. The script looks something like this:

    create database myresults
    go
    -- TOP 20 queries (by CPU)
    SELECT TOP (20)
        [TotalCPU] = total_worker_time,
        [TotalElapsedTime] = total_elapsed_time,
        [ExecutionCount] = execution_count,
        [AverageCPUInMicroseconds] = cast(total_worker_time / (execution_count + 0.0) as money),
        [DBName] = DB_NAME(ST.dbid),
        [ObjectName] = OBJECT_NAME(ST.objectid, ST.dbid),
        [QueryText] = (SELECT [processing-instruction(q)] = CASE
                WHEN [sql_handle] IS NULL THEN ' '
                ELSE (SUBSTRING(ST.TEXT,(QS.statement_start_offset + 2) / 2,
                    (CASE
                            WHEN QS.statement_end_offset = -1 THEN LEN(CONVERT(NVARCHAR(MAX),ST.text)) * 2
                            ELSE QS.statement_end_offset
                            END - QS.statement_start_offset) / 2))
                END
    			FOR XML PATH(''), type),
        [QueryPlan] = qp.query_plan
    INTO myresults.dbo.myresultset
    FROM sys.dm_exec_query_stats QS
    CROSS APPLY sys.dm_exec_sql_text([sql_handle]) ST
    CROSS APPLY sys.dm_exec_query_plan ([plan_handle]) QP
    ORDER BY total_worker_time DESC
    go
     
    backup database myresults to disk = 'C:\Users\Michael Swart\Desktop\Output.bak'
    go
     
    drop database myresults
    go

    The best part of this is that the data is backed up perfectly, the data captures all the data with no truncation. Xml fields are stored as xml fields and once restored, it’s easy to access the resultset exactly as it was on the target computer.

    The bad news is that although it should be fairly safe, it’s not as easy to assure my friend that the query above changes nothing. It does! It creates a whole database and backs it up. It then drops that database.

    Hang on a second, if my friend can use this method, that means my friend (the non-dba) is authorized to drop databases. Yikes! What’s up with that?

    Method 3: Using the Export Data Wizard

    It seems like SQL Server’s “Export Data” wizard was just built for this job. That can be accessed here:

    SSMS's Export Data...

    But I don’t recommend it, not one bit. It launches a wizard which takes me through a set of choices, and no matter what I choose, it always seems to choke on the xml column.

    It’s clear that this Export Data Wizard is using SQL Server Integration Services (SSIS) under the covers. How do I know it’s SSIS under the covers? Well if the clumsy data type handling weren’t a giveaway (zing!), the wizard can save the job I specified as an SSIS package.

    The SSIS shows through too much to recommend to my friend, who has no dreams of becoming a B.I. Maestro.

    So it seems like this method is only appropriate for those situations when cutting and pasting to Excel would be just as appropriate. And it’s kind of hard to write instructions simpler than “Cut and paste into Excel”.

    Method 4: Use Management Studio’s Results-To-File

    Most people use the query window’s default setting, Results-To-Grid. Sometimes there’s a use for Results-To-Text (like when using sp_helptext for example). But what about Results-To-File? I can set that using the Query window’s context menu here:

    The context menu used to access "results to file"

    I can also enable results to file with Ctrl+Shift+F. I had to look that up though. There’s a reason I haven’t learned that shortcut by heart. It’s because results-to-file doesn’t work very well. As far as I can tell, it’s like specifying results-to-text and then pasting that into a text file.

    The biggest problem here is that depending on the results, the data gets truncated easily and the newlines and tabs in the data get mixed up with the rest of the formatting. By the way, I’m also out of luck (and for the same reason) if I use SSMS to save grid results.

    Method 5: Use BCP, along with SQLCMD mode

    If I can instruct my friend to use a query window in SQLCMD mode, then I can write a script which will get a bcp script to export data. Enable SQLCMD mode by choosing it from the Query menu in SSMS:

    Selecting SQLCMD Mode from the Query Menu

    And the script I would have my friend run looks like this:

    !!bcp "Select 'query omitted' as [BigLongQuery]" queryout "c:\temp\desktop\output5.dat" -SSWARTLAPTOP -T -n

    which I could then get back on my side using bcp or using OPENROWSET.

    This method is okay. It’s kind of awkward (and any instructions to my friend will be equally as awkward) and it always takes some work to get the syntax just right. But BCP is kind of like Rob Lowe, he hasn’t changed in years and he’s really not that bad.

    Method 6: Output the Whole Thing Using FOR XML PATH

    Hey! Now I’m getting somewhere! XML was built for this kind of thing. All I do is take the query, and append FOR XML PATH to it and send the query to the friend saying “Run this and paste the results into notepad.” Here’s the whole query:

    SELECT TOP (20)
        [TotalCPU] = total_worker_time,
        [TotalElapsedTime] = total_elapsed_time,
        [ExecutionCount] = execution_count,
        [AverageCPUInMicroseconds] = cast(total_worker_time / (execution_count + 0.0) as money),
        [DBName] = DB_NAME(ST.dbid),
        [ObjectName] = OBJECT_NAME(ST.objectid, ST.dbid),
        [QueryText] = (SELECT [processing-instruction(q)] = CASE
                WHEN [sql_handle] IS NULL THEN ' '
                ELSE (SUBSTRING(ST.TEXT,(QS.statement_start_offset + 2) / 2,
                    (CASE
                            WHEN QS.statement_end_offset = -1 THEN LEN(CONVERT(NVARCHAR(MAX),ST.text)) * 2
                            ELSE QS.statement_end_offset
                            END - QS.statement_start_offset) / 2))
                END
    			FOR XML PATH(''), type),
        [QueryPlan] = qp.query_plan,
        0x01231234 as bin
    FROM sys.dm_exec_query_stats QS
    CROSS APPLY sys.dm_exec_sql_text([sql_handle]) ST
    CROSS APPLY sys.dm_exec_query_plan ([plan_handle]) QP
    ORDER BY total_worker_time DESC
    FOR XML PATH

    And the results that are sent back are immediately accessible without any processing.

    I think this method works best for this particular situation. It hits my goals perfectly. Now if you dear reader want to use this method in your situation, there’s just some extra caveats:

    • SSMS restricts XML data results to 2 megabytes, although that’s configurable.
    • If you query binary fields (say query plan handles for example), then you might be surprised if the results come back in base64 encoding rather than hex which is a bit of a pain.

    Other methods

    Powershell I know you powershell pros are eager to give a one line script that does everything here. So lets hear it the comments! I know that powershell treats sets of objects (rather than unix’s cshell which deals with text). So I’m curious how well it serializes objects to a file.

    SQL Server Integration Services I haven’t forgotten about SSIS either, it’s the tool that was built for moving data from here to there in some format (any here, any there, any format). But I just couldn’t get the instructions simple enough for this scenario.

  7. SQL Server 2012 Launches Today

    Today is March 7th, 2012 and SQL Server 2012 launches today. Woohoo!

    Here’s the scoop:

    • We’ve had the Community Technology Previews (CTPs) to mess around with for a while now.
    • We’ve had a Release Candidate (RC0) for a little bit.
    • Today the product launches.
    • April 1st, (no kidding) the product is released to general availability.
    • But don’t wait until then. Today you can download the RTM evaluation copy.

    Also

    • SQL Server Data Tools is now RTW! More on that below.

    The Virtual Launch

    ... it's just an illustration of one

    Microsoft is marking the event with a virtual launch here. An online event that lets you watch them demo what they’ve been working on since the last release (And unless you’re a BI professional, that means since 2008)

    Personally What I’m Most Excited About: SSDT

    The code I work with gets deployed to a large number of systems which run on multiple versions of SQL Server. In order to avoid maintaining multiple code-lines, I can’t make use of the new features until we de-support all the older versions.

    For example, I can’t use the IIF function (new in 2012) if I’m worried that the code could be deployed to a 2008 instance somewhere. So there’s going to be some waiting for me and probably some of you …

    Not so fast, there are still a few things for us and I want to focus on one of them. It’s called SQL Server Data Tools (SSDT) and it was Released To Web (RTW) today.

    What’s SSDT do? It’s a free development environment. It lets you develop SQL for any version of SQL Server (including SQL Azure) whether you’re connected or not. It supports Intellisense (and as far as I can tell, even against 2005). It does schema comparisons and data comparisons.

    The clincher for me is that the table designer is the first one I’ve ever preferred over a plain old text editor.  It’s that slick. Check it out.

     

  8. Careful Hashing

    Cryptographic hashing functions like MD5 or SHA-1 are examples of one-way functions that take any data you give it and return a fixed set of bytes (for example, 16 bytes for MD5 and 20 bytes for SHA-1).

    I recently had trouble coming up with the same hash value for what I thought were the exact same inputs. Compare the following two examples:

    // C# example
    byte[] byteArray = Encoding.Unicode.GetBytes("I'm a lumberjack and I'm okay.");
     
    SHA1 sha = new SHA1CryptoServiceProvider();
    byte[] hashedPasswordBytes = sha.ComputeHash(byteArray);
     
    Console.WriteLine(BitConverter.ToString(hashedPasswordBytes));
    //30-F1-CE-3E-7E-1F-17-27-A7-B5-97-39-44-D0-FE-EA-DA-69-53-2B

    That’s different than this:

    -- SQL example
    select HASHBYTES('SHA1', 'I''m a lumberjack and I''m okay.')
    --0x4D6936CEDD0DE794AC86F9A4099DCBB4EFED8E1F

    So what gives? Before reading any further, can you spot the problem? In both examples above, two things are happening. First a string of characters is being converted into a string of bytes and then that string of bytes gets hashed. I’ve learned that it’s not the hash function that’s the problem, it’s converting the character string into binary that is inconsistent.

    String Conversions

    Maybe that’s not surprising, but still, it’s one more thing to be careful about: converting strings to bytes. SQL Server’s wide characters (NCHARs and NVARCHARs) are equivalent to .Net’s Encoding.Unicode and SQL Server’s single non-unicode characters (CHAR and VARCHAR) (single-byte characters) are equivalent to .net’s Encoding.ASCII.

    So for completeness sake, these should be equivalent:

    -- wide characters (remember unicode string literals are prefixed with 'N')
    select HASHBYTES('SHA1', N'I''m a lumberjack and I''m okay.')
    --0x30F1CE3E7E1F1727A7B5973944D0FEEADA69532B
     
    -- non-unicode characters:
    select HASHBYTES('SHA1', 'I''m a lumberjack and I''m okay.')
    --0x4D6936CEDD0DE794AC86F9A4099DCBB4EFED8E1F

    And

    // Unicode
    byte[] byteArray = Encoding.Unicode.GetBytes("I'm a lumberjack and I'm okay.");
    SHA1 sha = new SHA1CryptoServiceProvider();
    byte[] hashedPasswordBytes = sha.ComputeHash(byteArray);
    Console.WriteLine(BitConverter.ToString(hashedPasswordBytes));
    //30-F1-CE-3E-7E-1F-17-27-A7-B5-97-39-44-D0-FE-EA-DA-69-53-2B
     
    //ASCII
    byteArray = Encoding.ASCII.GetBytes("I'm a lumberjack and I'm okay.");
    hashedPasswordBytes = sha.ComputeHash(byteArray);
    Console.WriteLine(BitConverter.ToString(hashedPasswordBytes));
    //4D-69-36-CE-DD-0D-E7-94-AC-86-F9-A4-09-9D-CB-B4-EF-ED-8E-1F

    This means that if you’re going to hash strings in SQL Server and compare them to strings that are hashed in C#, you want to use

    • Encoding.Unicode for NCHARS and NVARCHARS
    • Encoding.ASCII for CHARS and VARCHARS

    This also means that for this use case, I wouldn’t feel safe using other encodings (such as Encoding.UTF8) because SQL Server can’t duplicate that string to binary conversion.

    Caveats

    • Don’t trust me and test it out for yourself!
    • Collations matter a little (sort order is not relevant here). The code page for your SQL Server collation should match the code page of the .net encoding when dealing with single byte characters. Unicode strings should not care about collation.
    • If you do find any counter-examples for anything I’ve written here, let me know in the comments.
  9. Microsoft Model Databases: Some Unconventional Alternatives?

    It seems like Microsoft has always provided example databases in order to illustrate database concepts.

    Pubs: Back when SQL Server 2000 was new, Microsoft provided this sample databases and maybe you remember them. Pubs was a really simple database that was designed to represent data that a fictional book publishing company might use. The database had a name (pubs) but I don’t think anyone ever gave the company a name. From here we’re told that it was “used to demonstrate many of the options available” for SQL Server.

    Northwind also was available for SQL Server 2000. It was a database that was used for a fictitious company “Northwind Traders”. The company “imports and exports specialty foods from around the world.” It was better than pubs, but that wasn’t saying much. Pubs set a pretty low bar.

    AdventureWorks: The Adventureworks database came along with SQL Server 2005 and we were introduced to the bicycle manufacturer Adventure Works Cycle. The database was pretty comprehensive and it touched on a lot more features of SQL Server. I’ve seen countless performance demos that query Sales.SalesOrderDetail. There’s even an Adventureworks Data Warehouse, an Adventureworks OLAP cube and even a light version: AdventureworksLT.

    Contoso And lately there’s a new sample database Contoso. It’s another fictional retail company (Contoso Ltd. This time they’re selling electronics and appliances). This database is used for Business Intelligence demos. With Contoso, Microsoft has developed and made a data warehouse and an SSAS cube available.

    Presumably, each of these databases is an example of Microsoft putting their best foot forward. Or at least it’s a dataset which enables them to show off what their database software can do.

    Beyond Demos

    So those databases are good for demos.

    But if we’re looking for other Microsoft-designed databases, we don’t have to stop there. There’s other ones that we can look at. I’m talking about the databases that Microsoft ships along with their products. When looking at these, remember:

    • There are some good examples, …
      We can look at those databases and see how they’re constructed. If you see some interesting db designs, you can use that.
    • … and there are some not so good examples …
      These databases weren’t built as database role models. So there’s bound to be a lot of database design choices that are not ideal. I know that Microsoft has huge pressure to maintain backwards compatibility.
    • … and that’s okay.
      It’s actually a fun exercise to pick out the good designs from the bad. It’s like a smorgasbord, you take the lessons you like and leave the rest.

    Some examples of what I’m talking about.

    ReportServer

    Install and configure an instance of SQL Server Reporting Services. Take a look at the ReportServer. It’s a pretty nice database schema. One thing that was interesting was the choice of identity column data types:

    • Catalog seems to be one of the main tables here and they’re using uniqueidentifiers (GUIDs) for identity columns. Normally people scoff at that, but I think it makes sense in this case. Catalog is the table that tracks reports, datasources and folder structures. Even at it’s largest, it can’t be that big (I just checked my own installation and the clustered index fits on 1 extent). That means that all the worries about GUIDs for primary keys are mitigated and the choice of using a globally unique identifier might make sense here.
    • ExecutionLogStorage Contrast that with ReportServer’s largest table, depending on how busy your Reporting Services installation is, this table can grow very large with one row for every report view. When I look at the choice for their primary key, I see that it’s clustered index on a single BIGINT column defined as it’s IDENTITY column. Well done.

    Any Database’s Metadata

    • I like the object oriented influence here:
      • The relationship between sys.objects –> sys.tables and between sys.objects –> sys.procedures is the subtype relationship class –> subclass. And that shows in the schema, for example, there’s one sys.object row for every sys.table row.
      • The relationship between sys.foreign_keys –> and sys.foreign_key_columns is the object composition relationship. And that also shows in the schema too. The “primary key” for the table foreign_key_columns is (constraint_object_id, constraint_column_id) and it contains the primary key of its owner.
    • Another good example is the sys.index_columns table. Look at the schema diagram below. It’s a perfect instance of my favourite DB Design example

    Any Sharepoint database

    • [Sigh] These are mostly counter-examples here, I took a brief glance at the schema once and decided quickly that there are no good lessons. Can anyone tell me if that’s changed?
  10. Removing Duplicate Dimension Rows in SSIS

    Something New For Me


    I’m relatively new to SQL Server Integration Services (SSIS). And so I want to explain how I solved a problem and invite Business Intelligence (BI) experts (and wannabe experts) to comment or point out things that I missed.

    A Short Description of the Problem

    So here’s a short description of the problem I’m trying to solve. It involves populating a dimension table. I want a data flow transformation which removes duplicate key values. In other words, every time I come across a new key value for a dimension, I want to pass through that entire row (not just the key values). This is kind of like a cross between T-SQL’s DISTINCT and FIRST_VALUE.

    I found out recently that people have treated this topic before:

    But because of my bad luck, I only found these posts after I muddled through a solution. But I was surprised to find out my solution had some interesting differences.

    The Setup

    First I’ll explain my situation. Say I’ve got a flat file containing data about Prime Ministers:

    The source data

    My main goal is to use this data to populate a data warehouse. That data warehouse has a schema like this:

    The Data Warehouse Schema

    This example is just for illustration. In real life, the number of fact rows is very large, and the number of dimension rows is relatively small. The time dimension will be pre-populated, but I want to load the other dimensions using values from the flat file. So that means I plan to process the data twice: The first time will be to load these dimensions, and the second time will be to load the fact table. When I process the data for dimensions, I :

    • Use a lookup transform to find whether that dimension already exists in the warehouse
    • Use the No Match Output to identify dimension rows I haven’t seen before. Those rows will be used to load the dimension. (i.e. I’m treating the dimension like a Never-Changing-Dimension rather than a Slowly Changing Dimension.)

    The problem is that these unmatched rows can be duplicated and I don’t want those duplicate dimensions rows getting into the dimension table.

    Things I considered:

    • I considered using a staging table on the DB side. But that required too much space, the amount of space needed was comparable to the fact table, not the dimension table.
    • I also considered inserting into dimensions a row-at-a-time using the MERGE statement on the database side. But I didn’t consider that for long. The performance with that strategy was so unacceptable.
    • I could have used a sort/aggregate, but that was a blocking transformation and I really didn’t need the data sorted.

    What I ended up with

    I created a script transform which seems to do the trick for what I want. My Data Flow task looks like this:

    In those script transforms you see up there, I told SSIS that each script will be used to filter rows out or let them pass through. I did that by setting ExclusionGroup to 1 (by default, it’s normally 0). Here’s where that’s done:

    Then in the script, I created a list of seen dimensions. And I let rows pass through if they haven’t yet been seen. The script simply looks like this:

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
     
        if (Row.Party_IsNull)
            return;
     
        if (!seenParty.Contains(Row.Party))
        {
            seenParty.Add(Row.Party);
            Row.DirectRowToOutput0();
        }
     
    }

    Logically it behaves exactly the same as the sort transformation (as explained in “Dead Easy Duplicate Key Removal” by Todd McDermid). Except for these things:

    • It seems to perform fast
    • It’s not a blocking transformation
    • But you don’t get all that for nothing. What we’re not getting with this method is that the rows aren’t sorted. In most cases, that doesn’t matter.
    • It takes some memory, so you have to be aware of that. Ask yourself, how well can all unseen dimension keys fit in memory?

    Some questions for you BI guys

    And here’s where I’d like you to chime in.

    1. What’s the most common approach to this problem?
    2. I get the feeling that I could use the Cache Transform for this technique. But I haven’t tried it.
    3. Are there any new features in SQL 2012 that help?
    4. Am I re-inventing the wheel here?
    5. Am I missing anything else that should be obvious?
  1. 1
  2. Next ›
  3. Last »