DBPedias

Your Database Knowledge Community

Grant Fritchey

  1. Execution Plan for a User Defined Function

    When you execute a multi-statement user-defined function you may see an execution plan that looks something like this:

    image

    It appears as if the cost of the UDF is free. This is especially true if you use the UDF in a query with other objects, such as joining it to actual tables. Since the optimizer always assumes a multi-statement UDF has a single row for statistics estimates, it’ frequently displays a low cost.

    But you know that there’s more going on there, right? It’s a multi-statement UDF because it’s doing a lot of work, but that is not reflected in the execution plan.. or is it?

    What if we went after the cache? Let’s run this little query:

    [sourcecode language="sql"]SELECT  deqp.query_plan,
            dest.text,
            SUBSTRING(dest.text, (deqs.statement_start_offset / 2) + 1,
                      (deqs.statement_end_offset – deqs.statement_start_offset)
                      / 2 + 1) AS actualstatement
    FROM    sys.dm_exec_query_stats AS deqs
            CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp
            CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
    WHERE   deqp.objectid = OBJECT_ID(‘dbo.ufnGetContactInformation’);[/sourcecode]

    This query doesn’t do anything really fancy. I’m using the sys.dm_exec_query_stats which holds aggregated performance metrics for any query currently in cache because it has the plan_handle that lets me look up execution plans in sys.dm_exec_query_plan. It just so happens that you also get the objectid there in the plan so we can look for the specific plan that corresponds to our UDF (I’m using an example from AdventureWorks2008R2). The results from the query look like this:

    image

    Each row represents a statement that has been executed from the UDF. There’s just a single plan for the UDF, and it looks like this:

    image

    Without even drilling down to all the details of what’s going on in the plan, I think we can agree, that this UDF is not zero cost, but may in fact be extremely expensive depending on what’s going on inside that execution plan.

    Why is this useful? Because it lets you look behind that simplistic… lie, that is shown as a plan for the execution of a UDF to see what the UDF is actually doing. This may make it possible to improve the performance of the UDF since you’ll be able to identify missing indexes and other information thanks to your direct access to what’s going on inside the UDF.

    I had demoed this during my 24 Hours of PASS presentation and was asked to put the code up on my blog, so here you go. I hope it’ proves helpful.

  2. Meme Monday: Deadlines

    Tom LaRock (blog|twitter) has assigned an interesting topic for Meme Monday this month, working with deadlines.

    Some people hate deadlines. Some people love deadlines. But when you have one, there’s a good chance you need to really meet that deadline or there could be repercussions.

    I have a tip that I’ve found useful in the past. When I have a deadline for delivery of X, I evaluate that requirement and determine what, if anything, is dependent upon other people. I’ve found this to be the biggest issue because my deadline is seldom their deadline. So if there are parts of my deadline where I’m dependent on others, that’s my first task: Go have a chat.

    Here are a few questions:

    • Is X a deadline for them?
    • If so, when do they expect to deliver?
    • If that negatively affects my deadline, can they adjust? If not, you may need to talk to someone about your deadline.
    • If it’s not a deadline for them, when can they deliver?
    • Can I hold you to that? Meaning, I’ve just given the other person a deadline.

    Obviously you can’t just storm into an office and start rearranging people’s schedules (trust me, they get upset). You’ll need to work with them. But, it needs to be your first priority because, you can put yourself on a 24/7 alert to make a deadline if you want to, but, unless you’re in management, you don’t get that kind of control over your fellows.

    To make your deadline, priority one is to recognize your dependencies on others.

  3. Execution Plans, What Do I Look At?

    lookThe question came up, what are the top 3-5 things that you look at in a query. I realized then that I hadn’t really written them down. There are some areas that let me know pretty much immediately where problems lie within a plan. You don’t necessarily know the answer from looking at these things, but you know you have a problem. Being me, I couldn’t stick to 5, so here are 6:

    1. Immediately check the properties on the first operator (SELECT/DELETE/INSERT). There’s tons of information in here and people frequently just ignore it. You can see if the plan is based on full optimization or not. That immediately tells me if I’m working on the optimizer’s best estimate at a plan or I’m looking at a timeout. If it’s a timeout, I know I can’t count on this plan being good. Also I get the parameter compile time & run time values to help determine parameter sniffing issues in the properties.
    2. Warnings. If you see no join predicate warnings, that should jump up and poke you in the eye like some jumping eye-poking little monster. Same goes with missing statistics. The new warnings in plans in 2012 are equally important to know about. These are quick pieces of information that should immediately point you in a direction of inquiry within the plan.
    3. The most costly operations. Yes, I know you can’t trust these values because they are just estimates. Yes, the estimated operator cost is the same in both estimated and actual plans. No measurements of actual cost are taken by an execution plan. But these are the numbers available, so I use them. They’re accurate more often than not and quickly lead you to the possible source of the problem.
    4. Fat pipes. Now really, these are usually just an indication of volume and knowing that you’re moving lots of rows helps you read a plan (umpty-million rows joining umpty-million rows through a Loop might be an issue). But the real alarm bells go off when you see big fat pipes going to little skinny ones or skinny ones to big fat ones or even skinny-fat-skinny. That’s a huge indicator of something
    5. Extra operators. This is like that old statement about pornography “I can’t give you a precise definition, but I know it when I see it.” It’s looking for stuff that doesn’t belong. For example, you don’t have a single ORDER BY statement, but there sits a Sort operation. Why? That’s my “extra operator” indicator telling me to dig deeper.
    6. Scans. Scans are not necessarily bad and Seeks are not necessarily good. In general terms, with smaller data sets, you usually would expect to see a Seek over a Scan. Scans can be the right, good, and best choice, especially for very large data sets and in other situations, but they are an indicator of potential issues.

    After that, you have a whole slew of things you can get worked up about. Table Spools in SELECT statements are usually not good. Look for indications of multi-statement UDF’s (Scan’s with zero cost). Loop joins when a Merge makes more sense, Merges where you ought to see a Hash, missing index information, mismatch between estimated & actual, blah, blah, blah… You get the point. There’s just tons & tons of information within execution plans. But that list of six are usually the first things I look for.

  4. Deadlock Monitoring

    There are four different ways you can get information about deadlocks in your system. These are:

    • traceflag 1204
    • traceflag 1222
    • trace events
    • extended events

    For years I’ve been pushing traceflag 1222 as the best of the lot. Well, that’s over. I’ve been learning more and more about extended events and I’m currently in love with xml_deadlock_report event. Why? Simple, it has everything that traceflag 1222 has, but there are two glorious things about it.

    First, it’s not going to be filling up my error log with, for the error log, noise. Seriously. As much as I liked the information displayed from traceflag 1222, I didn’t like what it did to the log, but I saw it as a necessary evil.

    Second, it’s XML baby! That means you can set up queries to pull information out if you need to. Yeah, I know you can hop through hoops to set up the text from the error log for querying, but, that’s nothing like pointing to the file and referencing the nodes. Way too cool.

    On top of that, for simple deadlocks, the graphical version of the deadlock graph gives you most of what you need to quickly identify the issue. You absolutely don’t get that with the traceflag.

    Nope. I’m sold more than ever on extended events. If you haven’t started exploring them, I strongly recommend you do. Especially with SQL Server 2012 just around the corner. The functionality around extended events there completely makes these things accessible in ways they weren’t before.

  5. SQL Server XQuery Against Execution Plans

    One of the greatest things about all the DMOs is how you can combine the information they present to you with execution plans. It allows you to see what a query is doing, even as it’s executing, because the plan is created first. I couldn’t possibly emphasize enough how important that’s going to be in your day-to-day troubleshooting.

    Even better is the fact that you’re going to be able to make use of XQuery to pull useful information out of the execution plans that are in cache.

    Or are you?

    Here’s a query to pull some information out of the procedure cache:

    SELECT  deps.type_desc,
    deps.last_execution_time,
    deps.execution_count,
    deps.total_logical_reads,
    dest.encrypted AS EncryptedText,
    dest.text,
    deqp.query_plan,
    deqp.encrypted AS EncryptedPlan
    FROM    sys.dm_exec_procedure_stats AS deps
    CROSS APPLY sys.dm_exec_sql_text(deps.sql_handle) AS dest
    CROSS APPLY sys.dm_exec_query_plan(deps.plan_handle) AS deqp
    WHERE   dest.text LIKE 'CREATE PROCEDURE dbo.GetSalesDetails%' ;
    

    When I run this, it has just two (2) reads and completes in about 38ms on my system. It works great. But what if I were interested in seeing the operators in this plan and the costs associated? Very straight forward. This would do it.:

    SELECT  Xplan.p.value('declare default element namespace "<a href="http://schemas.microsoft.com/sqlserver/2004/07/showplan";">http://schemas.microsoft.com/sqlserver/2004/07/showplan";</a>
    @NodeId', 'int') AS NodeID,
    Xplan.p.value('declare default element namespace "<a href="http://schemas.microsoft.com/sqlserver/2004/07/showplan";">http://schemas.microsoft.com/sqlserver/2004/07/showplan";</a>
    @PhysicalOp','varchar(50)') AS PhysicalOp,
    Xplan.p.value('declare default element namespace "<a href="http://schemas.microsoft.com/sqlserver/2004/07/showplan";">http://schemas.microsoft.com/sqlserver/2004/07/showplan";</a>
    @EstimatedIO','decimal(7,6)') AS EstimatedIO,
    Xplan.p.value('declare default element namespace "<a href="http://schemas.microsoft.com/sqlserver/2004/07/showplan";">http://schemas.microsoft.com/sqlserver/2004/07/showplan";</a>
    @EstimatedCPU','decimal(7,6)') AS EstimatedCPU,
    dest.text,
    deqp.query_plan
    FROM    sys.dm_exec_procedure_stats AS deps
    CROSS APPLY sys.dm_exec_sql_text(deps.sql_handle) AS dest
    CROSS APPLY sys.dm_exec_query_plan(deps.plan_handle) AS deqp
    CROSS APPLY deqp.query_plan.nodes('declare default element namespace
    "<a href="http://schemas.microsoft.com/sqlserver/2004/07/showplan";">http://schemas.microsoft.com/sqlserver/2004/07/showplan";</a>
    //RelOp') XPlan (p)
    WHERE   dest.text LIKE 'CREATE PROCEDURE dbo.GetSalesDetails%' ;
    

    Here’s where it gets fun though. First, this doesn’t show me the reads & stuff. But it does show me that it’s taking 160ms to return five rows (five operators, five rows). Which seems like an extraordinary long time until you look at one little part of the XQuery: ‘//RelOp’

    The // is basically like putting % into a LIKE clause (see this post and the discussion). It’s a wild card search. Go through the whole structure and find the operators, RelOp. That’s a very time consuming and expensive mechanism for retrieving XML data. Unfortunately, because of how the XML is structured, sort of mirroring how the plans operate, the operators are nested inside of each other. Which means you can’t know the exact path to the operators unless you already knew the plan. This means you’re going to be doing wild card searches against your XML.

    You might think to yourself, “Fine. No biggie, it just takes a little longer.” And that would be true, except, I’m looking at a plan with five operators. Examine your plans. How many of the problematic ones, the ones you’re going to want to query contain just five operators? And now you begin to see the issue. As the plans get bigger and/or, you’re unable to filter as many as possible from the query, the queries get slower and slower. They can, and probably will, affect your production server and it’s performance.

    What to do? Ah, well, that’ll have to wait for the next blog post on this topic. I’m working on a good solution now.

  6. PASS Summit 2011: Wrap-Up

    DSC06049Another Summit done gone by and I’m exhausted. You know you did the Summit correctly if you’re crawling onto the plane to leave because you can no longer stand. That’s where I was on Saturday.

    I’m going to go ahead and write down a few thoughts about the Summit, but I want to point you over to Andy Leonard’s (blog|twitter) blog for what is a truly great summary. I won’t blame you if you read that & skip this.

    Still here?

    This year the Summit was amazing. I think it’s probably the best run Summit I’ve attended. From registration on Sunday night, to the final sessions on Friday, everything was dealt with in a timely and professional fashion that made the place a joy to attend. Huge kudos to the staff and the volunteers who organized and ran it.

    Want to know what PASS is about? Connect, Share, Learn. Yeah, that’s the new SQL Chuck Norris at the end. More on that in another blog post. But that’s it. It’s the people. You make connections with marvelous individuals at the Summit. Then, you share with them. They talk to you. You talk to them. You both learn. It’s a glory. If you’re not meeting people at the PASS Summit, you’re doing it wrong. Other conventions may be all about the vendor floor or the sessions, but PASS is all about the people (and the vendor floor and the sessions).

    My own experience started with presenting a pre-con with Gail Shaw (blog|twitter). All I’ll say is, Gail & I had glorious fun. The MVP party was Monday night and… wow! Thanks Microsoft. Tuesday I attended some sessions away from the Summit. Tuesday night I went to the First Timer’s event because I wanted to see what Tom LaRock (blog|twitter) had cooked up. It was great. The big sister/brother program at the beginning was slightly rocky, but from there it took off. The reception was great and the Quiz Bowl rocked, again (although I missed half of it because I was sent off to get beers for people on stage, proof of my love for them, Chuck, Andy…) Then it was down to the Red Gate/SQL Server Central party to see a dear friend receive the Exceptional DBA of the Year award. Jeff Moden should probably get an award for the Decade, but that hasn’t been invented yet.

    I was invited to sit at the bloggers table (which I’ve now dubbed “the problem children’s table” we’ll see if the moniker sticks) for all three key notes. I live-blogged every one and you can read about them here. Dr. Dewitt (blog) stole the show… again. It actually sounds like he might be back next year too. Very exciting. And yes, that’s a picture of me with Dr. Dewitt. The restraining order hasn’t been issued yet.

    Over the week I only attended a few sessions, two-three a day. It’s just hard to stop talking to people. Plus, this year, I worked the Red Gate booth as a vendor as much as possible (I had to leave several times to do other things, like attend the Women In Technology lunch for the first time). That was incredibly hard work. My hat’s off to all the vendors. You guys have no clue just how much they’re busting their humps in there. We announced a little contest on Wednesday. Looks like fun.

    I want to point out one individual who just, for want of a better term, assaulted the Summit. This was her first PASS Summit. She organized and led the first ever #sqlrun (a bunch of people got together at 7AM and ran two miles, although I cut out early to shower before the key note… you’re welcome), presented a session at her first Summit, was the host for Women in Technology, met just about every single person at the Summit, and was just a glorious little bundle of energy through the whole thing. Jes Borland (blog|twitter) is one to watch.

    During the week, the book SQL Server Deep Dives 2 was released. All proceeds from the book are going to Operation Smile. If you attended the summit you can download a free sample of this book and Volume 1 from Manning.

    Taken as a whole, the Summit was just simply wonderful. This community just doesn’t stop giving, and sharing. I’m so glad I get to take part in it.

  7. Extended Events Data

    I’ve been working quite a bit over the last week or so with extended events in Denali. The sheer magnitude of what you can do with extended events is just becoming clear to me. The interesting thing though is how much the basics are similar to trace. Similar mind you, not the same. For example, the best way to gather trace data is to output it to a file and then read the file into a table for later querying. It’s the same with extended events. There’s even a function that acts as a table:

    SELECT *
    FROM sys.fn_xe_file_target_read_file ('C:\APath\Query Performance Tuning*.xel', NULL, NULL, NULL);
    

    This can take advantage of roll-over files just like the old function used for traces. You can also provide offsets to read a sub-set of the file, which is cool. The event fields come back within an XML column. Here’s a sample of how I’m dealing with that from the new version of my Query Performance Tuning book:

    WITH xEvents AS
    (SELECT object_name AS xEventName,
    CAST (event_data AS xml) AS xEventData
    FROM sys.fn_xe_file_target_read_file
    ('C:\Apath\Query Performance Tuning*.xel', NULL, NULL, NULL))
    
    SELECT xEventName,
    xEventData.value('(/event/data[@name=''duration'']/value)[1]','bigint') Duration,
    xEventData.value('(/event/data[@name=''physical_reads'']/value)[1]','bigint') PhysicalReads,
    xEventData.value('(/event/data[@name=''logical_reads'']/value)[1]','bigint') LogicalReads,
    xEventData.value('(/event/data[@name=''cpu_time'']/value)[1]','bigint') CpuTime,
    CASE xEventName WHEN 'sql_batch_completed' THEN
    xEventData.value('(/event/data[@name=''batch_text'']/value)[1]','varchar(max)')
    WHEN 'rpc_completed' THEN
    xEventData.value('(/event/data[@name=''statement'']/value)[1]','varchar(max)')
    END AS SQLText,
    xEventData.value('(/event/data[@name=''query_plan_hash'']/value)[1]','binary(8)') QueryPlanHash
    FROM xEvents;
    

    yeah, XQuery isn’t my strong point. However, you get the idea. Oh, and the CASE statement is necessary because the rpc_complete code is visible in the statement field, but the sql_batch_complete code is visible in the batch_text field. That’s just Microsoft keeping us on our toes.

    Want to talk about this and other ways of gathering performance metrics? Please come and visit during SQL In The City:Los Angeles on October 28th.

  8. PASS Keynote Day #3: Dr. Dewitt

    And we’re off. We opened with a video of people saying “Connect, Share, Learn” and “This, is Community”

    Rob Farley & Buck Woody came out with a song about long running queries.

    [8:20]Wayne Snyder has been working with the PASS organization since 1999. He spoke at the first PASS Summit and he’s been on the board forever. He has finally hit the point as immediate-past president where he has to leave the board. We’ve got a great little thank you for him from all sorts of people. Wayne is a magnificent guy, seriously. If you see him, thank him for his service.

    [8:28]We have a new executive committee, Bill Graziano as President, WHOOP, Douglas McDowell is Executive Vice-President and Thomas LaRock is VP of Marketing and finally, Rushabh Mehta is now the immediate past president.

    SQL Rally Nordic is taking place in Sweden and has completely sold out. SQL Rally Dallas will be in May. We have tons of SQL Saturday’s coming up.

    You can get a registration, including 2 full days of Pre-Con between now & Nov 15, for $1395.

    [8:33]Dr. David Dewitt, Big Data, What’s the Big Deal?

    I got to meet Dr. Dewitt earlier in the week. I’m very excited about this presentation.

    He’s going to be presenting with a co-presenter, Rimma Nehme is going to be on stage helping out. His presentations are magnificent.

    [8:38]Dr. Dewitt, despite being smarter than the whole room, is really funny. He’s opening up with some great slides and some good humor.

    And then we’re off. We’re talking big data. Petabytes. Typically housed on large clusters of low-cost commodity hardware. He’s also talking Zetabytes. Uh, wow.

    Why are things growing so much? More and more things are picking up data. There are sensors from phone location & others. There are web clicks and page views. Data has been determined to be too valuable to delete. Cost of storage has dropped.

    [8:42]Managing “Big Data” The old system is to use a parallel database system. like Ebay with 10PB on 256 nodes. New stuff is a NoSQL System like Facebook, 20PB on 2700 nodes. Bing is 150PB on 40K nodes. Wow!

    NoSQL is not meant to be that SQL is dead, but that there are things in addition to SQL.

    Why do people love NoSQL? There is more data model flexibility. Relaxed consistency models such as eventual consistency. Low upfront software costs. Never learned anything but C/Java in school, so maybe they’re not smart enough. Finally, introducing time to insight.

    [8:45]Time to insight. The idea is not to load data into a system, but instead of schema first, they just want to get it. No cleansing, No ETL, No load, analyze data where it stands. Schema first, vs. Schema later.

    Major Types of NoSQL Systems. Key/Value Stores like MongoDB or Cassandra, Usually have a data model such as JSON, records are sharded across nodes in a cluster by hashing on key, single record retrieval. The other kind is Hadoop, scalable fault tolerant framework for storing and processing massive data sets, no data model, records are stored in a file system. The first is NOSQL OLTP and the second is NOSQL Datawarehouse.

    [8:50]The new reality is that we’re really going to see two universes as the new reality, Structures and Unstructured data or relational systems and NoSQL systems. Relational databsae systems provide maturity, stability, consistency. The noSQL systems are all about flexibility.

    Why is Dr. Dewit talking? Because the world has changed. Relational db systems no longer the only game in town. As SQL people we must accept this new reality and understand how best to deploy technologies. This is not a paradigm shift. RDBMS will continjue to dominate transactoin processing and all small to medium sized data warehouse. But many Businesses will end up with data in both universes.

    [8:52] Hadoop all started at Google. They needed to manage massive amounts of click stream data. It had to e scalable, fault tolerant, easy to program against.

    What does Hadoop Offer? Ability to analyze massive amounts of data. Scalable. Easy to program, low upfront costs, think big data warehousing

    The stack is a HDFS at the bottom, then MapReduce, then Hive & Pig, on the size is Sqoop and then there are other management parts.

    [8:55] Underpinnings of the entire Hadoop ecosystem. HDFS. Traditional hierarchical file system. Written in Java so it’s highly portable.

    File splits are done through 64mb chunks and then the blocks are stored around the cluster. Each block is stored as a separate file.

    Disk placement. A replication factor is set. Assuming a set of 3, it uses triple replication. Then you can survive two failures.

    There is a name node, which is one instance per cluster, which is a single point of failure. There’s a backup node, which backs up the name node? And then there’s a series of data nodes.

    Giant file comes in, a bunch of blocks are created, then the namenode receives messages with the blocks and the namenode moves it into appropriate datanodes, but the client does the writes, the namenode just balances and replicates.

    [9:02]Reads go the other way, the namenode tells the client where the data is stored and reads it back out that way.

    Failures can occur through disk errors, datanode failures, switch/rack failures, namenode failures, data center failures.

    Datanode failures is handled by the namenode which always manages the datanodes, tracking what’s stored where and which datanodes are available and which are not. When there’s a failure of a datanode, the namenode will identify which blocks were stored on that datanode and will replicate them to other nodes. Further, it will balance stuff out as datanodes come back online.

    [9:07]This means you get something that is highly scalable. No use of mirroring or RAID, which reduces cost. It uses a single mechanism (triply replicated blocks) t deal with a variety of failure types rathern thatn multiple different mechanisms. Negatives, block locations and record placement invisible to higher level software. Makes it impossible to employ many optimizations successfully employed by parallel DB systems.

    So to improve performance they use MapReduce. Takes a large problem and divides it into small problems. Perform the same function on all sub-problems. Combine the output from all sub-problems. The first is the map, the last is the reduce.

    MapReduce is done by a Master, JobTracker, and a set of Slaves, TestTracker. The JobTracker watches failures, etc.

    It all works with HDFS. On each node there is a TaskTracker and a DataNode and the JobTracker is on the server with the NameNode.

    [9:15]Seeing the data come out of the MapReduce mechanism, but then you see that the data can’t be grouped in mechanisms other than how they’re stored.

    Reduce Phase basically takes each mapper and reads from them to get the information out of the reducer. Each reducer works with a mapper and the reducer is the thing that applies the function that actually fixes the data coming out of the mappers.

    Yeah, I’m starting to get a bit lost.

    Actual number of Map tasks, M is generally much larger than the number of nodes used. This heps deal with data skew and failure. Skew with reducers is still aproblem.

    Failures, like HDFS, MapReduce framework is fault tolerant & other stuff.

    Beauty of this stuff is that it is highly fault tolerant, relativey easy to write arbitrary distributed computations, mr framework removes burden of dealing with failures from programmer.

    Cons are Schema embedded in application code, which means that sharing data between apps is really hard. Also, performance tuning is difficult.

    Keeping up with this stuff as fast as I can. We’re drinking from a really big fire hose in here.

    [9:24]Hive and Pig. MapReduce can’t really do joins. Developers can spend days writing apps to analyze data like what we can do with a query in the relational systems (although I know people that take days to write a TSQL query). Declarative query languages are not going away. It’s still efficient for what it does.

    Hive and HiveQL s the mechanism used to put in a query language. Hive has tables. Richer column types than in SQL. You get the primitive types, but you also get stuff like associative arrays, lists, structs.Hive tables have to be partitioned. It’s still using HDFS files.

    [9:28]All the files are stored in chunks. If there’s no filtering, it will go against all files. This thing could seriously thrash disks, especially when you consider the fact that data is not relational at all.

    HiveQL Optimization and Execution. There is very little statistics. Uses simple heuristics of pushing sections below joins, output of … something. slide went by too quick.

    PDW vs. Hive. Testing using 600gb from TPC benchmarks. On small data sets, for straight forward queries, it was pretty radically different. Then when you complicate it, hive was about 4000 seconds, pdw is about 1000, and then pdw-p is a factor of 10 faster. That’s because of how parallel data systems can work.

    Hive vs. PDW. Partitioning the hive tables provides no benefit since there is no way t control where HDFS places the blocks. Different for PDW.

    We’re going to have to connect the two universes. Increasingly the data first lands in unstructured universe. MapReduce is an excellent big data ETL tool. Sqoop provides a command line load utility.

    Some analyses are easier to do in a procedural language. Sqoop provides querycapability to pull data from RDBMS using SQL, but you can’t get good performance.

    Some applications need data from both universes. Only option is unstructured universe as unstructured data can’t go into structured. Sqoop moves it over to there.

    This means that there are some types of queries that are never going to perform well with this data.

    [9:40]And I just got lost. Sqoop is really complicated. It basically moves the data in & out of the two universes, which scans the entire table (yes, scans) a table N+1 times.

    There has to be a better way!

    Moving data is so 20th Century. Why not build a database system that understands both systems. It can have the expressive power of a language like HiveQL. He’s trying to build an “Enterprise Data Manager” which his partner hates (name, open to suggestions).

    Dr. Dewitt asserts that SQL Server PDW just needs to understand unstructured data. It needs improved scalability.

    Jenn McCown of Midnight DBA suggested TARDIS because it can move between universes. I like that. Let’s lobby.

    Remember, this is not a paradigm shift. These things are designed to meet different problems. RDBMS only or HADOOP only is not going to be the default.

    Send ideas for next year to dewitt@microsoft.com We want this man to come back guys.

    This was another great presentation from Dr. Dewitt!

    [9:48]And now for the Q&A.Bing David Dewitt and you can get a link to the PASS Talks. The slide deck is available.

    What features can we expect to see in SQL Server that manage private cloud. He can’t answer.

    What are impacts of big data on scientific community? Dr. Dewitt talks about how the Sloan digital sky survey data was managed by Jim Gray (before he was lost at sea). They are working on building database systems for scientific data that allows for declarative languages. he says that the science community just doesn’t use anything but files. They’re trying.

    Here are his slide decks

    [9:55]Can you elaborate more on the importance of supporting Hadoop? he does believe that there are two universes, so hadoop is out there and running next to SQL Server, today. The world has spoken. The two things are being used. We need to embrace it. We should not bury our heads in the sand.

    As a DBA working primarily with relational databases, what should I do to be better prepared for this new universe? Dr. Dewitt says download and play with the code. Get started.

  9. PASS Summit Day 2–Key Notes

    Bill Graziano has come out on stage, looking marvelous, in a traditional kilt and stockings. Thanks Bill.

    For those who don’t know, Day 2 at the Summit every year is Kilt Day.

    [8:19]Outstanding volunteers being recognized are Tim Radney and Jack Corbett. These are some outstanding people who work their bottoms off for the PASS Community. If you meet them, thank them.

    The 2011 PASSion Award goes to Lori Edwards. She’s simply amazing. Congratulations Lori and thank you for all the work you’ve done!

    [8:23]Time to eat our vegetables. We’re looking at the financials. It’s a slightly painful process, but important to understand where the money goes since this is a non-profit organization managed by volunteers. You should understand where the money comes from and where it goes.

    [8:25]Quentin Clark is the keynote speaker from Microsoft. We’re seeing a bunch of people talk about the new functions about SQL Server 2012 (nee Denali). There really is a lot of new functionality coming up. Some of it is quite exciting. Some of it is probably edge-case stuff for really big systems. Regardless, there’s lots to learn.

    [8:31]SQL Server 2012 is the biggest release ever, especially when you take into account that SQL Azure is part of the common code base. Quentin Clark is going through his “Fantastic 12”. First up is Required 9’s & Protection. Integrations services is a server, they’re introducing HA for StreamInsight and there’s AlwaysOn. We’re getting a testimonial from the Mediterranean Shipping Company. I’m just not a fan of testimonials. Show me demos or teach me stuff, all the rest is marketing.

    [8:41]Testimonial done, we’re getting some demonstrations of AlwaysOn. They’re showing how the wizard can be used to build out a true topology of mirroring servers. It really is cool to watch this happen live. I’m going to be spending some time with it myself.

    It’s great how they’ve set up a single listener to manage connections so that code is no longer necessary to manage the capabilities.

    [8:45]The blogger table got a little rambunctious during the demo because the guy doing it had very small font sizes and didn’t use zoom it at all. When he occasionally used zoomit people started cheering. Big tip, just because the screen is large doesn’t mean people in the back of a large room can see what’s there.

    Seriously, the demo was good, but you need to make sure your presentation skills are up to snuff.

    [8:48]Blazing fast performance is set up off Performance Enhancements. But the big one is ColumnStore Index. #3 is Rapid Data Exploration, which is some of the stuff from yesterday. #4 is Managed Self-Service BI, in short PowerView and PowerPivot. But there’s also expanded management from Sharepoint and they’re adding Reporting Alerts, which is really huge.

    [8:51]#5 is Credible and Consistent Data. They’re working off the BI Semantic Model, cloud data through shared information, mentioned yesterday. They’re also expanding out Data Quality Services and Master Data Services.

    [8:53]Lara Rubbelke is showing how to manage the data. She got a quick dig in at using Excel for everything, which was funny if a little subtle. She’s using SharePoint and TSQL. Better still, she used Zoomit and large fonts so not only did we get a great demo, but we could see it. I’m excited about this stuff in ways that I wasn’t before the demo, which is the purpose of these demos. Well done Lara. Thanks.

    And yes, there is more cloud stuff in here. We’re going to the cloud people.

    This is some snazzy stuff and Lara is making it look great. I’m excited about the concept of business people being able to set up alerts on reports that they work from.

    [9:02] #6 is Organizational Compliance. They’ve expaned Audit and added user-defined auditing and filtering. They’ve added User-Defined Server Roles (already wrote that up in my new SQL Server In a Month of Lunches book).

    #7 is Piece of Mind. That breaks down to production-simulated Application Testing (WHOOP!). They’re expanding System Center Advisor & Management Packs for SCOM. They’re expanding critical support to a Premier Mission Critical level.

    [9:06] #8 is Scalable Data Warehousing. #9 is Fast Time to Solution, in short, appliances. They are releasing an Optimized and pre-tuned appliance. They’re working with vendors to have hardware & software ready for plugging in instantly to your system. Better still, they’re providing your choice of hardware.

    They’re working at with, based on the logos, HP & Dell, to create these things. These are very nice ways to get yourself a major system in pretty much no time.

    [9:19]#10 is Extend Any Data, Anywhere. They’[re working with PHP, Java & Hadoop as mentioned yesterday. But they’re announcing a LINUX driver in order to convert from “something” to SQL Server (huh, wonder what “something” is, not). Finally they’re expanding file table, 2d spatial and semantic search.

    And a demo from Michael Rys. The room just got smarter. Although now, we’re looking at a screen that is unreadable. Same thing with the new app, which sounds cool. Then he zoomed, with a comment “For the Zoomit Fans” and we got to actually see his app working. Huge applause from the bad kids at the bloggers table.

    This is pretty neat stuff. I love seeing new code at work. This is slick and very powerful. We’ll put that stuff to work. EXCELLENT demo. Exciting stuff.

    [9:28]#11 is Optimized Productivity. IN short, Juneau, or the new SQL Server Data Tools. It’s still Visual Studio… enough said. they’re talking about changes that make things unified across Database & BI. They’re creating a deployment & Targeting Freedom too. The main thing is a new embedded express version that doesn’t require an actual install to have a database ready to go. That sounds great!

    #12 is Scale on Demand. AlwaysOn, of course. Deployment across Public & Private combined deployments, and Deployment & Targeting Freedom.

    The demo is going to be good. We started with an example of adaptive learning when the presenter came out on stage and said he’d been reading the tweets and immediately used zoomit so we could see the screen. Yay!

    We got to see the dacpac at work with Azure and the bacpac (god I hate that name) at work as well. It’s really cool. Oh, and now they’re setting up a method to connect from SSMS to the Azure Storage so you can restore your bacpac files locally. Nice work.

    [9:40] And a second Demo! Cool. Cihan Biyikoglu is showing Elastic Scale.

    Oooh, just saw an execution plan in Azure. I’m excited.

    [9:50] That was actually a really decent key note. I wasn’t bored. I got enough technical information that I’m leaving feeling a bit excited about what’s coming out. Well done guys!

  10. PASS Summit So Far

    This is Day 2 of the Summit proper. But for me, this is the fifth day of the Summit and my sixth in Seattle. Sunday was the opening of registration and it was like a high school reunion with people that you really love. Registration itself only takes about three minutes, but I was there for almost two hours talking to people, friends from previous PASS Summits, SQL Saturday’s, SQL Cruise, and SQL In The City.

    Monday I put on a pre-conference seminar with Gail Shaw. We had 120 attendees. Despite our worries and multiple contingency plans, we had more than enough material for the time (you try coordinating 7 hours of material with someone from South Africa who has less band width than my phone). It went off wonderfully. Gail and I had a blast.

    Tuesday I did some learning of  my own and then attended the new First Timer’s session at the conference. It was fantastic. Much better than last year. If you were part of the First Timer’s program this year, I’ll bet you felt the love. Everyone involved with putting this on, from the Big Sisters/Brothers to the PASS Staff, well done. Don Gabor’s lightening networking was amazing to see. The Quiz Bowl was wonderful, thanks Tim Ford & Louis Davidson. After the reception it was time to go down to the SQL Server Central Party that’s put on every year by Steve Jones and Red Gate Software. It was a little different to work the party instead of attend it.

    Wednesday I live-blogged and tweeted the keynote sitting next to one of my favorite people Jen McCown, 1/2 of the Midnight DBA team, and another of my favorite people, Denny Cherry. We heard the official name for Denali will be SQL Server 2012 and we heard a bunch of marketing talk (except for a very short demo from Denny Lee of Microsoft that was interesting).

    The big moment for me came at 10:05AM Seattle time when Red Gate Software announced the contest to actually, literally, send a DBA (or data professional) on a rocket, into space. Go to DBAInSpace.com to check it out.

    Then I got my learn on. I went to Buck Woody’s cloud session and got some good insights from Buck (and a couple of laughs). Next, I went to see another of my favorite people, Jes Borland, present for the first time ever at the Pass Summit. She absolutely rocked. Then it was my turn to present to 120 people on Dynamic Management Objects. I thought it went well, including dropping in a joke slide for Paul Randal, 55 Ways to Safely Shrink Your Database, which will be on the final recording (lordy I hope they let me talk again at the Summit).

    It’s been a great Summit so far.

  1. 1
  2. Next ›
  3. Last »