DBPedias

Your Database Knowledge Community

Patrick Scwhanke

  1. Talking SQL to NoSQL data stores (Part 3)

    Use cases and samples

    If you worked through the last step of this series, you probably ended up with some HBase, SimpleDB or other NoSQL store, attached the Toad Data Hub to it, mapped some data structures into tables and thereby opened it up to the SQL world.

    Use case 1: Report data

    Now we can just run a query like this:

    SELECT p.*
    FROM myHBase.persons p
    WHERE p.surname LIKE 'A%'

    myHBase is the MySQL database in the Data Hub, pointing to an HBase instance. “persons” is the name of the shadow table, pointing to an HBase table. So we can easily apply a well-known SQL construct to find all persons in our HBase store with a surname starting with “A”.

    This query would work on any MySQL client. In addition to that, when using Toad for Cloud Databases, we could demonstrate our laziness by using its graphical Query Builder and drag/drop together queries like this (the table and columns in the screenshots are in German, my apologies for that…):

    Using query builder for querying an HBase table

    Using query builder for querying an HBase table

    With the Toad client, all result sets can be exported to standard formats like Excel, CSV, XML, HTTP, PDF or as SQL insert script. It can even generate an Excel sheet with an auto refresh button in it that pulls in current data when clicking on it (so-called Excel Linked Queries).

    Use case 2: Move data into NoSQL

    We can also import data most of these formats, i.e. let Toad create INSERT commands that the Data Hub automatically translates into the corresponding NoSQL store’s API calls.

    If the data is already in another database, it’s even easier, e.g. if we attached an HBase store as well as a standard Oracle database to the Data Hub, we could just run this:

    INSERT INTO myHBase.persons
    SELECT * FROM myOracle.persons

    Use case 3: Get the best from SQL and NoSQL world

    NoSQL systems are strong when it comes to massive parallel scaling and map-reduce tasks, but they are often missing adequate reporting or data mining capabilities. Typically, there aren’t even indexes! Relational systems on the other hand with their sophisticated data indexing, joining and grouping capabilities are much better suited for these tasks. Therefore, it we want to report on the data, we need some ETL processes to load it over from the NoSQL store into a relational warehouse or reporting system. And it couldn’t be easier because all we need to do is:

    INSERT INTO myOracle.persons
    SELECT * FROM myHBase.persons

    There is also some wizardry for all these data export, import, moving around functions, e.g. this screenshot shows the two essential steps when moving data from an HBase into an Oracle table. If we need this regularly, we can save the whole wizard process into a template and easily recall it next time.

    Loading data from HBase into an Oracle table

    Loading data from HBase into an Oracle table

    Use case 4: Feel free to join

    Maybe you won’t need this on a daily basis, but it is possible and very straight-forward to even join together data from different data stores. This query joins a person table in HBase to their addresses which are located in Oracle and reports on the number of addresses per person:

    SELECT p.id, COUNT(*)
      FROM myHBase.persons p INNER JOIN myOracle.addresses a
      ON (p.id = a.persid)
    GROUP BY p.id

    Again, if you prefer the drag/drop style, use Toad’s query builder like this to get the same result:

    SQL joining an HBase table with an Oracle table

    SQL joining an HBase table with an Oracle table

    So it all boils down to writing well-known SQL queries or DML commands. The data hub cares for the whole translation into the corresponding NoSQL API. Toad for Cloud Database and especially its Data Hub are like brokers between SQL and NoSQL world.

    If this makes any sense to you, feel free to check it out here.

    There are also some very useful demo videos floating around here.

    And if you need some more detailed instructions on how to map HBase and Cassandra tables into relational tables, check these out:

    Feel free to comment!


  2. Talking SQL to NoSQL data stores (Part 2)

    Talking SQL to NoSQL – Setting it up

    In Part 1, we looked at what Cloud databases and NoSQL data stores are and why they are out there. The actual topic for this part is really Quest Software’s Toad for Cloud Databases which acts as a translator, making it possible to talk SQL to databases like Hadoop, Cassandra, SimpleDB, etc. that have no SQL engine at all.

    Toad for Cloud Databases actually consists of two parts (you may also have a look at the architecture diagram):

    • A “Data Hub” that does the translation job, i.e. it maps data structures from the NoSQL world to relational tables and translates SQL syntax to the NoSQL store’s specific API calls and vice versa.
    • A graphical UI acting as client, connecting to the Data Hub, configuring the data hub and issuing SQL (depicted as a toad in the diagram ;-) )

    Toad for Cloud Databases Architecture
    Toad for Cloud Databases Architecture

     

    As you can see, the data hub plays a central role in the whole concept. Technically spoken, it is a MySQL server. Why that? Because MySQL offers an easy way to plug in your own “storage engine”. Well-known and popular storage engines are e.g. MyISAM and InnoDB. Toad’s data hub uses its own, proprietary storage engine named “HUB”.

    In this case, the name “storage engine” is a bit misleading because it stores no data at all, but only translates, i.e. it presents the NoSQL data structures as MySQL tables. I tend to call these “shadow tables” because they don’t actually contain any data, but just point to the data.

    Getting SQLized

    Querying a shadow table by an SQL SELECT makes the data hub pull the data from the NoSQL store, pipe it through the storage engine, thereby putting it into relational format and give back an SQL result set.

    Issuing an SQL INSERT against a shadow table makes the data hub grab the values you want to insert, translate them into the NoSQL store’s format and execute the appropriate API calls. Similar for Updates and Deletes.

    The data store can be connected to an arbitrary number of different cloud, NoSQL or even conventional databases (for conventional databases it uses a standard ODBC driver). The actual mapping is as follows:

    • Each connected data store is mapped to one MySQL database in the data hub MySQL server.
    • Each NoSQL data structure can be mapped to a shadow table in the corresponding MySQL database. The mapping can be influenced by the user, e.g. which MySQL table column maps to which HBase column/column family.

    Thereby, the whole MySQL syntax is available for querying or manipulating data in the attached data stores (HBase, Cassandra etc.)

    The data hub can run on Windows, Linux or inside an Amazon EC2 instance, given of course that it has connectivity to the data stores you want to connect to. If the data stores are in different network segments with no connectivity in between, different data hubs are needed.

    If the data stores in question reside in Amazon EC2, e.g. SimpleDBs, it may be a good idea to have the data hub run in EC2 as well as this will keep network traffic and cost to a minimum (inter-instance network traffic is typically faster than between an EC2 instance and the internet, and Amazon doesn’t account for inter-instance network traffic inside the same EC2 region).

    Registering the data stores and mapping the data structures into relational “shadow tables” can be done either with the Windows based graphical client (“Toad for Cloud Databases”) or with the Toad for Cloud Databases Eclipse plugin. Here is a screenshot how an HBase data store can be registered:

    Registering an HBase data store to the data hub

    Registering an HBase data store to the data hub

    After finishing the registration and mapping of tables, any MySQL client can be used to access the shadow tables, and thereby the NoSQL data. So you could just use the standard MySQL command line. The Toad for Cloud Database graphical client offers a couple of useful features beyond pure connectvity and editing SQL, e.g. a graphical query builder, wizards for exporting, importing, comparing and syncing data as well as reporting tools.

    In the next part, I will give some samples and use cases on how we can use this to build bridges between SQL databases and NoSQL stores.


  3. Talking SQL to NoSQL data stores (Part 1)

    Cloud databases and NoSQL data stores – what’s it about?

    What is a cloud database, and why is it supposed to be useful? Actually, the idea behind it is quite similar to what is behind the rest of the cloud computing story: Providing databases (and hopefully an approriate SLA specifying availability and service quality) on a self-service basis, provisioning a whole database server in minutes instead of days, paying per actual usage. The user (let’s call him consumer) doesn’t necessarily need to care about how the SLA is achieved, e.g. which hardware is used, things like clustering, replication, database or OS patching and upgrading.

    Sometimes this is described as DaaS (Database as a Service), and it is a special case of PaaS (Platform as a Service). Just as a side note: this is a lot different from IaaS (Infrastructure as a Service), e.g. running an Oracle database inside an Amazon EC2 instance. In this case, the DBA still has to do all the mentioned stuff like patching, upgrading etc.

    SQL Azure would be an example of a cloud database. It is very similar to SQL Server (actually it is based on SQL Server, accessed through a gateway that abstracts away any infrastructure related stuff, see here for an in-depth article from Kalen Delaney). Another example would be Amazon’s RDS which offers MySQL instances as service and – announced for Q2/2011 – also Oracle instances as service.

    That said, what is NoSQL about? Well, another quality that is expected from a cloud is scalability or “elasticity”, i.e. easy scale-up and scale-down. Any application that needs to scale and that is running on top of a database, will only scale well if the underlying database does so. Imagine a given SLA that can be fulfilled by starting up one standard database instance (Oracle,SQL Server,DB2,Sybase, whatever you like) and one application server instance. Now, if your application gets a real big success story and you have to meet the same SLA with 200 times the load, you will probably need two things:

    • Startup 200 application server instances AND have a good load balancing
    • Startup 200 database instances AND have a good load balancing

    The latter point is where NoSQL comes in. Traditional databases are just not well equipped for scaling this much. Their clustering is more about high availability (failover cluster) or – like Oracle RAC – partial load balancing as it means having several instances, but still one single database.

    NoSQL data stores are not prominent for not talking SQL, this is just a side effect (actually NoSQL is mostly interpreted as “Not Only SQL”). They are prominent because of their virtually unlimited scaling capabilities. You need more power? Start up 100 additional instances. The NoSQL store’s engine will take care of things like replication, redundant storage, load distribution. Under the bottom line, this is a real compute cluster, or a compute cloud.

    Main drivers for NoSQL data stores in the past have been the various social media applications we see today, but also massive parallel online gaming, and other portals with 5-digit up to 9-digit user numbers. Traditional databases just have a very tough time when trying to cover these requirements with an acceptable pricing.

    So, from a developer’s point of view, NoSQL data stores are just the right for the job. They are specialized, lightweight engines with very simple APIs (typically no complex SQL engine), but with extreme scalability. Each NoSQL store is highly optimized for its respective purpose. Administration efforts are supposed to be very low, e.g. some of these systems don’t have a backup concept at all as they provide highly redundant storage anyway.

    Prominent examples from the NoSQL world include systems like HBase, Cassandra, MongoDB or Amazon’s SimpleDB:

    • HBase and Cassandra are typically called “Columns Stores” or “Wide Column Stores” where the data is physically stored column-wise, not row-wise like traditional database systems.
    • Amazon’s SimpleDB, Microsoft’s Azure Table Storage and the well-known BerkeleyDB are so-called “Key Value Stores”, i.e. they store simple key-values pairs, similar to Windows registry entries.
    • MongoDB or CouchDB have specialized on storing documents, specifically those in JSON format, so they are commonly called “Document Stores”.
    • “Graph Databases” like Neo4J are not yet that widely adopted, but are optimized for storing and processing graph related data, e.g. for GPS systems.

    This site gives a great overview on the NoSQL universe.

    In the next part, I will show how to use Quest’s Toad for Cloud Databases to talk SQL to NoSQL data stores. As SQL depends on at least some relational data structure, this is easier for Column Stores and Key-Value Stores than for Document Stores. Therefore the former two types of systems will be our focus.


  4. Talking SQL to NoSQL data stores

    This post serves as a starting point for a short blog series where I will show you how to talk SQL to NoSQL data stores.

    Which sounds like a contradiction in itself, because most of the NoSQL stores don’t have an SQL interface at all (remember: NoSQL stands for “Not only SQL”, so this is true for most, not necessarily all NoSQL stuff).

    While having no SQL interface may be quite ok or even positive from a developer’s perspective, I (with my DBA affinity) still like and prefer good old SQL, e.g. for quick checks what’s going on in the database (sorry: it’s called a “data store” when it comes to NoSQL…). Even in the NoSQL world you may want to join or compare data from different sources or move them around, from NoSQL to a standard SQL database or vice versa. In these cases, SQL comes handy as it allows you to do these things with SQL joins, INSERT commands and other well-known syntax.

    A great tool that works like a translator between SQL and the correponding NoSQL data store’s API is the Toad for Cloud Databases from Quest Software (my employer, actually). You can find it here, and during the next posts I will give an overview of NoSQL and how Toad can connect SQL to NoSQL.

    Part 1: Cloud databases and NoSQL data stores – what’s it about?

    Part 2: Talking SQL to NoSQL – Setting up

    Part 3: Talking SQL to NoSQL – Use cases and samples

    Enjoy and feel free to comment!


  5. New Category “NoSQL”

    This post just initiates another category for this blog called “NoSQL”.

    As you might expect, it will cover topics around NoSQL databases, like Hadoop, Cassandra, SimpleDB etc.