DBPedias

Your Database Knowledge Community

The Basho Blog

  1. Remembering Steve Jobs

    October 5, 2011

    Our worlds have been forever changed by a man who refused to accept the status quo and believed that he could change the world. Not only did he change the world, he made others believe that they could do the same. Basho exists because of Steve Jobs and his vision to change the world. Tonight we are all saddened and mourning the loss of a hero and visionary that we all considered great. Our hearts go out to his family and friends.

    The Employees of Basho Technologies

  2. Riak 1.0 Is Officially Released!

    September 30, 2011

    We are absolutely thrilled to report that as of today, Riak 1.0 is officially released and ready for your production applications!

    Riak 1.0 packages are available. Go download one. And then go read the release notes, because they are extensive and full of useful information highlighting the great work Basho has done since the last release.

    There is already a lot of literature out there on the release, so here are the essentials to get you started.

    The High Level Awesome

    For those of you who need a refresher on the release, this 1.0 Slide Deck will give you a quick overview of why you should be excited about it. The big-ticket features are as follows:

    Secondary Indexing

    In 1.0 we added the ability to build secondary indexes on your data stored in Riak. We developed this functionality because, quite frankly, people needed a more powerful way to query their data.

    Riak Pipe And Revamped MapReduce

    Riak’s MapReduce functionality isn’t anything new, but we did a lot of work in this release to make the system more robust, performant, and resistant to failures. Riak Pipe is the new underlying processing layer that powers MapReduce, and you’ll be seeing a lot of cool features and functionality made possible as a result of it in the near future.

    Lager

    Usability is a huge focus for us right now, and logging is something that’s less-than-simple to understand in Erlang applications. To that end, we wrote a new logging framework for Erlang/OTP called Lager that is shipping with 1.0 and drastically reduces the headaches traditionally associated with Erlang logging and debugging.

    Search Integration

    Riak Search has been a supported Basho product for several releases now, but until 1.0 you were required to build it as a separate package. In 1.0 we’ve merged the search functionality into Riak proper. Enabling it is a simple one line change in a configuration file. Do this and you’ve got distributed, full text search capabilities on top of Riak

    Support for LevelDB

    Riak provides for pluggable storage backends, and we are constantly trying to improve the options we offer to our users. Google released LevelDB some months back, and we started to investigate it as a possible addition to our suite of supported backends. After some rigorous testing, what we found is that LevelDB had some attractive functionality and performance characteristics compared to our existing offerings (mainly Innostore), and it will be shipping in 1.0. Bitcask is still the default storage engine, but LevelDB, aside from being an alternative for key/value storage, is being used as the backend behind the new Secondary Indexing functionality.

    Cluster Membership

    One of the most powerful components of Riak is riak_core, the distributed systems framework that, among many others things, enables Riak to scale horizontally. Riak’s scalability and operational simplicity are of paramount importance to us, and we are constantly looking to make this code and system even better. With that in mind, we did some major work in 1.0 to improve upon our cluster membership system and are happy to report that it’s now more stable and scalable than ever.

    And So Much More …

    Riak 1.0 is a massive accomplishment, and the features and code listed above are just the beginning of what this release has to offer. Take some time to read the lengthy release notes and you’ll see what we mean.

    These improvements are many months in the making, and the bug fixes, new features, and added functionality make Riak (in our humble opinion) the best open source database available today.

    Thank You, Community!

    We did our best to ensure that the community was as big a part of this release as possible, and there’s no way the code and features would be this rock-solid without your help. Thanks for your usage, support, testing, debugging, and help with spreading the word about Riak and 1.0.

    And 1.0 is just the beginning. We’ll continue to refine and build Riak over the coming months, and we would love for you to be a part of it if you’re not already. Some ways to get involved:

    Thanks for being a part of Riak!

    The Basho Team

  3. Riak Pipe - the New MapReduce Power

    September 19, 2011

    A few months ago, I announced the opening of Riak Pipe, as well as two goals for the project. With the upcoming 1.0 release of Riak, we have achieved the first goal: new clusters will use Riak Pipe to power their MapReduce processing. Existing clusters will also be able to migrate to using Riak Pipe, with no changes needed from the client perspective.

    There are a few reasons you should be excited about running your MapReduce queries on Riak Pipe. First and foremost, Riak Pipe is designed as a work distribution system, and as such, it is better able to take advantage of the parallel resources available in the cluster. One small example of how Riak Pipe achieves this is simply by splitting the “map” phase processing into two steps: fetching the object from Riak KV, and transforming it. This allows the work of each step to happen in parallel; the next input will be fetched while the transformation of the last one is in progress.

    Riak Pipe also recognizes that a cluster's resources are finite, and that sometimes it's better to delay one pile of work in order to make progress on another. Processing phases in Riak Pipe, called fittings, provide backpressure to fittings upstream from them by means of limiting the sizes of their input queues. The upstream fittings pause their processing when the downstream queues are full, freeing up system resources (or at least not increasing their consumption) to allow those downstream processes a chance to catch up.

    Input queues are another example of Riak Pipe's parallel resource use. Inter-phase results are delivered directly from a vnode running one stage to the vnode that will process them for the next stage. Since they are not forced through a single, central process, the memory of the entire cluster can be used to move them around, instead of requiring a single machine's memory to handle them.

    The KV-object fetching stage of the new Riak Pipe MapReduce system is also much more of a well-behaved KV user. That is, the requests it makes are much more fairly balanced with respect to regular Riak KV operations (get, put, etc.). This means MapReduce on Riak Pipe should have much less impact on the performance of rest of your Riak use.

    Using Riak Pipe MapReduce is simple. Make sure that the setting {mapreduce_system, pipe} is in the riak_kv section of your cluster's app.config, and then … just send MapReduce queries over HTTP or Protocol Buffers as you always have. The results should be the same. There are a few knobs you can tweak, which control batching of reduce phase evaluation, but the goal of this release was a 100% compatible implementation of the existing MapReduce functionality.

    There is much more on the horizon for Riak Pipe, including more efficiency gains and exposing some of the new processing statistics it tracks, not to mention exposing more of its functionality beyond Riak KV's MapReduce. We're very excited about the future.

    If you would like to learn more about Riak Pipe, in general, and get involved, I recommend paging through the README to get an idea of its structure, and then browsing the new Riak KV MapReduce code for some examples.

    -Bryan

  4. Secondary Indexes in Riak

    September 14, 2011

    Developers building an application on Riak typically have a love/hate relationship with Riak's simple key/value-based approach to storing data. It's great that anyone can grok the basics (3 simple operations, get/put/delete) quickly. It's convenient that you can store anything imaginable as an object's value: an integer, a blob of JSON data, an image, an MP3. And the distributed, scalable, failure-tolerant properties that a key/value storage model enables can be a lifesaver depending on your use case.

    But things get much less rosy when faced with the challenge of representing alternate keys, one-to-many relationships, or many-to-many relationships in Riak. Historically, Riak has shifted these responsibilities to the application developer. The developer is forced to either find a way to fit their data into a key/value model, or to adopt a polyglot storage strategy, maintaining data in one system and relationships in another.

    This adds complexity and technical risk, as the developer is burdened with writing additional bookkeeping code and/or learning and maintaining multiple systems.

    That's why we're so happy about Secondary Indexes. Secondary Indexes are the first step toward solving these challenges, lifting the burden from the backs of developers, and enabling more complex data modeling in Riak. And the best part is that it ships in our 1.0 release, just a few weeks from now.

    How Do Secondary Indexes Work?

    From an application developer's perspective, Secondary Indexes allow you to tag a Riak object with some index metadata, and later retrieve the object by querying the index, rather than the object's primary key.

    For example, let's say you want to store a user object, accessible by username, twitter handle, or email address. You might pick the username as the primary key, while indexing the twitter handle and email address. Below is a curl command to accomplish this through the HTTP interface of a local Riak node:

    curl -X POST \
    -H 'x-riak-index-twitter_bin: rustyio' \
    -H 'x-riak-index-email_bin: rusty@basho.com' \
    -d '...user data...' \
    http://localhost:8098/buckets/users/keys/rustyk
    

    Previously, there was no simple way to access an object by anything other than the primary key, the username. The developer would be forced to "roll their own indexes." With Secondary Indexes enabled, however, you can easily retrieve the data by querying the user's twitter handle:

     # Query the twitter handle...
     curl localhost:8098/buckets/users/index/twitter_bin/rustyio
    
     # Response...
     {"keys":["rustyk"]}
    

    Or the user's email address:

     # Query the email address...
     curl localhost:8098/buckets/users/index/email_bin/rusty@basho.com
    
     # Response...
     {"keys":["rustyk"]}
    

    You can change an object's indexes by simply writing the object again with the updated index information. For example, to add an index on Github handle:

    curl -X POST \
    -H 'x-riak-index-twitter_bin: rustyio' \
    -H 'x-riak-index-email_bin: rusty@basho.com' \
    -H 'x-riak-index-github_bin: rustyio' \
    -d '...user data...' \
    http://localhost:8098/buckets/users/keys/rustyk
    

    That's all there is to it, but that's enough to represent a variety of different relationships within Riak.

    Above is an example of assigning an alternate key to an object. But imagine that instead of a twitter_bin field, our object had an employer_bin field that matched the primary key for an object in our employers bucket. We can now look up users by their employer.

    Or imagine a role_bin field that matched the primary key for an object in our security_roles bucket. This allows us to look up all users that are assigned to a specific security role in the system.

    Design Decisions

    Secondary Indexes maintains Riak's distributed, scalable, and failure tolerant nature by avoiding the need for a pre-defined schema, which would be shared state. Indexes are declared on a per-object basis, and the index type (binary or integer) is determined by the field's suffix.

    Indexing is real-time and atomic; the results show up in queries immediately after the write operation completes, and all indexing occurs on the partition where the object lives, so the object and its indexes stay in sync. Indexes can be stored and queried via the HTTP interface or the Protocol Buffers interface. Additionally, index results can feed directly into a Map/Reduce operation. And our Enterprise customers will be happy to know that Secondary Indexing plays well with multi data center replication.

    Indexes are declared as metadata, rather than an object's value, in order to preserve Riak's view that the value of your object is as an opaque document. An object can have an unlimited number of index fields of any size (dependent upon system resources, of course.) We have stress tested with 1,000 index fields, though we expect most applications won't need nearly that many. Indexes do contribute to the base size of the object, and they also take up their own disk space, but the overhead for each additional index entry is minimal: the vector clock information (and other metadata) is stored in the object, not in the index entry. Additionally, the LevelDB backend (and, likely, most index-capable backends) support prefix-compression, further shrinking index size.

    This initial release does have some important limitations. Only single index queries are supported, and only for exact matches or range queries. The result order is undefined, and pagination is not supported. While this offers less in the way of ad-hoc querying than other datastores, it is a solid 80% solution that allows us to focus future energy where users and customers need it most. (Trust me, we have many plans and prototypes of potential features. Building something is easy, building the right thing is harder.)

    Behind The Scenes

    What is happening behind the scenes? A lot, actually.

    At write time, the system pulls the index fields from the incoming object, parses and validates the fields, updates the object with the newly parsed fields, and then continues with the write operation. The replicas of the object are sent to virtual nodes where the object and its indexes are persisted to disk.

    At query time, the system first calculates what we call a "covering" set of partitions. The system looks at how many replicas of our data are stored and determines the minimum number of partitions that it must examine to retrieve a full set of results, accounting for any offline nodes. By default, Riak is configured to store 3 replicas of all objects, so the system can generate a full result set if it reads from one-third of the system's partitions, as long as it chooses the right set of partitions. The query is then broadcast to the selected partitions, which read the index data, generate a list of keys, and send them back to the coordinating node.

    Storing index data is very different from storing key/value data: in general, any database that stores indexes on a disk would prefer to be able to store the index in a contiguous block and in the desired order--basically getting as near to the final result set as possible. This minimizes disk movement and other work during a query, and provides faster read operations. The challenge is that index values rarely enter the system in the right order, so the database must do some shuffling at write time. Most databases delay this shuffling, they write to disk in a slightly sub-optimal format, then go back and "fix things up" at a later point in time.

    None of Riak's existing key/value-oriented backends were a good fit for index data; they all focused on fast key/value access. During the development of Secondary Indexes we explored other options. Coincidentally, the Basho team had already begun work to adapt LevelDB--a low-level storage library from Google--as a storage engine for Riak KV. LevelDB stores data in a defined order, exactly what Secondary Indexes needed, and it is actually versatile enough to manage both the index data AND the object's value. Plus, it is very RAM friendly. You can learn more about LevelDB from this page on Google Code.

    Want To Know More?

    If you want to learn more about Secondary Indexes, you can read the slides from my talk at OSCON Data 2011: Querying Riak Just Got Easier. Alternatively, you can watch the video.

    You can grab a pre-release version of Riak Version 1.0 on the Basho downloads site to try the examples above. Remember to change the storage backend to riak_kv_eleveldb_backend!

    Finally keep an eye out for documentation that will land on the newly re-organized Basho Wiki within the next two weeks.

    -- Rusty

  5. A Preview Of Cluster Membership In Riak 1.0

    September 09, 2011

    Being a distributed company, we make a lot of videos at Basho that are intended for internal consumption and used to educate everyone on new features, functionality, etc. Every once and a while someone makes a video that’s so valuable it’s hard not to share it with the greater community. This is one of those.

    This screencast is a bit on the long side, but it’s entirely worth it. Basho Software Engineer Joe Blomstedt put it together to educate all of Basho on the new cluster membership code, features, and functionality coming in the Riak 1.0 release (due out at the end of the month). We aim to make Riak as operationally-simple as possible to operate at scale, and the choices we make and code we write around cluster membership form the crux of this simplicity.

    At the end of this you’ll have a better idea of what Riak’s cluster membership is all about, its major components, how it works in production, new commands that are present Riak 1.0, and much, much more.

    And, if you want to dig deeper into what Riak and cluster membership is all about, start here:

    (It should be noted again that this was intended for internal consumption at Basho, so Joe’s tone and language reflect that in a few sections.)

    Enjoy, and thanks for being a part of Riak.

    The Basho Team

  6. Follow Up To Riak and Node.js Webinar

    Thanks to all who attended Wednesday's webinar on Riak (Search) and Node.js. If you couldn't make it you can find a screencast of the webinar below. You can also check out the slides directly.

    We hope we could give you a good idea what you can use the winning combination of Riak and Node.js for, by showing you our little syslog-emulating sample application, Riaktant. We made the source code available, so if you feel like running your own syslog replacement, go right ahead and let us know how things go. Of course you can just dig into the code and see how nicely Node.js and Riak play together too.

    If you want to get some practical ideas how we utilized Riak's MapReduce to analyze the log data, have a look at the functions used by the web interface. You can throw these right into the Node.js console and try them out yourself, since riak-js, the Node.js client for Riak, accepts JavaScript functions, so you don't have to serialize them into a string yourself.

    Thanks to Joyent for providing us with SmartMachines running Riak, and for offering No.de, their great hosting service for Node.js applications, where we deployed our little app with great ease.

    Sean and Mathias

  7. Free Webinar - Riak with Node.js - March 15 @ 2PM Eastern

    JavaScript is the lingua franca of the web, and many developers are starting to use node.js to power their server-side applications. Riak is a flexible, scalable database that has a JavaScript-friendly interface, including MapReduce in JavaScript and an awesome client library called riak-js. Put the two together and you have lots of possibilities!

    We invite you to join us for a free webinar on Tuesday, March 15 at 2:00PM Eastern Time (UTC-4) to talk about Riak with node.js. In this webinar, we'll discuss:

    • Getting riak-js, the Riak client for node.js, into your application
    • Storing, retrieving, manipulating key-value data
    • Issuing MapReduce queries
    • Finding data with Riak Search
    • Testing your code with the TestServer

    We'll address the above topics in addition to looking at a sample application. The presentation will last 30 to 45 minutes, with time for questions at the end. Fill in the form below if you want to get started building node.js applications on top of Riak!

  8. KillDashNine March Happening on Wednesday

    In February we kicked off the KillDashNine drinkup. It was a huge success (turns out we aren’t the only ones who care about durability) and, as promised, we’ll be having another drinkup this month. On Wednesday, 3/9, we will be clinking glasses and sharing data loss horror stories at Bloodhound, located at 1145 Folsom Street here in San Francisco.

    This month’s chosen cocktail is the Data Eraser, and it’s simple to make: 2 oz Vodka, 2 Oz Coffee Liqueur, 2 oz Tonic, and a dash of bitter frustration, anguish, and confusion (which is more or less how one feels when their data just disappears). And if you can’t make it, be sure to pour yourself a Data Eraser on 3/9 to take part in the festivities from wherever you happen to find yourself (or you can run your own local KillDashNine like Marten Gustafson did in Stockholm last month.)

    Registration details for the event are here, so be sure to RSVP if you’re planning to join us. In the mean time, spin up a few nodes of your favorite database and try your hand at terminating some processes with the help of our favorite command: kill-9.

    Long Live Durability!

    Basho

  9. Creating a Local Riak Cluster with Vagrant and Chef

    The Riak Fast Track has been around for at least nine months now, and lots of developers have gotten to know Riak that way, building their own local clusters from the Riak source. But there’s always been something that has bothered me about that process, namely, that the developer has to build Riak herself. Basho provides pre-built packages on downloads.basho.com for several Linux distributions, Solaris, and Mac OS/X, but these have the limitation of only letting you run one node on a machine.

    I’ve been a long-time fan of Chef, the systems and configuration management tool by Opscode, especially for the wealth of community recipes and vibrant participation. It’s also incredibly easy to get started with small Chef deployments with Opscode’s Platform, which is free for up to 5 managed machines.

    Anyway, as part of updating Riak’s Chef recipe last month to work with the 0.14.0 release, I discovered the easiest way to test the recipe — without incurring the costs of Amazon EC2 — was to deploy local virtual machines with Vagrant. So this blog post will be a tutorial on how to create your own local 3-node Riak cluster with Chef and Vagrant, suitable for doing the rest of the Fast Track.

    Before we start, I’d like to thank Joshua Timberman and Seth Chisamore from Opscode who helped me immensely in preparing this.

    Step 1: Install VirtualBox

    Under the covers, Vagrant uses VirtualBox, which is a free virtualization product, originally created at Sun. Go ahead and download and install the version appropriate for your platform:

    virtualbox-downloads

    Step 2: Install Vagrant and Chef

    Now that we have VirtualBox installed, let’s get Vagrant and Chef. You’ll need Ruby and Rubygems installed for this. Mac OS/X comes with these pre-installed, but they’re easy to get on most platforms.

    Now that you’ve got them both installed, you need to get a virtual machine image to run Riak from. Luckily, Opscode has provided some images for us that have the 0.9.12 Chef gems preinstalled. Download the Ubuntu 10.04 image and add it to your local collection:

    Step 3: Configure Local Chef

    Head on over to Opscode and sign up for a free Platform account if you haven’t already. This gives you access to the cookbooks site as well as the Chef admin UI. Make sure to collect your “knife config” and “validation key” from the “Organizations” page of the admin UI, and your personal “private key” from your profile page. These help you connect your local working space to the server.

    Now let’s get our Chef workspace set up. You need a directory that has specific files and subdirectories in it, also known as a “Chef repository”. Again Opscode has made this easy on us, we can just clone their skeleton repository:

    Now let’s put the canonical Opscode cookbooks (including the Riak one) in our repository:

    Finally, put the Platform credentials we downloaded above inside the repository (the .pem files will be named differently for you):

    Step 4: Configure Chef Server

    Now we’re going to prep the Chef Server (provided by Opscode Platform) to serve out the recipes needed by our local cluster nodes. The first step is to upload the two cookbooks we need using the knife command-line tool, shown in the snippet below the next paragraph. I’ve left out the output since it can get long.

    Then we’ll create a “role” — essentially a collection of recipes and attributes — that will represent our local cluster nodes, and call it “riak-vagrant”. Using knife role create will open your configured EDITOR (mine happens to be emacs) with the JSON representation of the role. The role will be posted to the Chef server when you save and close your editor.

    The key things to note about what we’re editing in the role below are the “run list” and the “override attributes” sections. The “run list” tells what recipes to execute on a machine that receives the role. We configure iptables to run with Riak, and of course the relevant Riak recipes. The “override attributes” change default settings that come with the cookbooks. I’ve put explanations inline, but to summarize, we want to bind Riak to all network interfaces, and put it in a cluster named “vagrant” which will be used by the “riak::autoconf” recipe to automatically join our nodes together.

    Step 5: Setup Vagrant VM

    Now that we’re ready on the Chef side of things, let’s get Vagrant going. Make three directories inside your Chef repository called dev1, dev2, and dev3, just like from the Fast Track. Change directory inside dev1 and run vagrant init. This will create a Vagrantfile which you should edit to look like this one (explanations inline again):

    Remember: change any place where it says ORGNAME to match your Opscode Platform organization.

    Step 6: Start up dev1

    Now we’re ready to see if all our preparation has paid off:

    If you see lines at the end of the output like the ones above, it worked! If it doesn’t work the first time, try running vagrant provision from the command line to invoke Chef again. Let’s see if our Riak node is functional:

    Awesome!

    Step 7: Repeat with dev2, dev3

    Now let’s get the other nodes set up. Since we’ve done the hard parts already, we just need to copy the Vagrantfile from dev1/ into the other two directories and modify them slightly.

    The easiest way to describe the modifications is in a table:

    Line dev2 dev3 Explanation
    7 “33.33.33.12” “33.33.33.13” Unique IP addresses
    11 (last number) 8092 8093 HTTP port forwarding
    12 (last number) 8082 8083 PBC port forwarding
    40 “riak-fast-track-2” “riak-fast-track-3” Unique chef node name
    48 “riak@33.33.33.12” “riak@33.33.33.13” Unique Riak node name

    With those modified, start up dev2 (run vagrant up inside dev2/) and watch it connect to the cluster automatically. Then repeat with dev3 and enjoy your local Riak cluster!

    Conclusions

    Beyond just being a demonstration of cool technology like Chef and Vagrant, you’ve now got a developer setup that is isolated and reproducible. If one of the VMs gets too messed up, you can easily recreate the whole cluster. It’s also easy to get new developers in your organization started using Riak since all they have to do is boot up some virtual machines that automatically configure themselves. This Chef configuration, slightly modified, could later be used to launch staging and production clusters on other hardware (including cloud providers). All in all, it’s a great tool to have in your toolbelt.

    Sean

  10. Webinar Recap - Schema Design for Riak

    Thank you to all who attended the webinar yesterday. The turnout was great, and the questions at the end were also very thoughtful. Since I didn't get to answer very many, I've reviewed all of the questions below, in no particular order. If you're looking for slides or video of the presentation and other resources, we've prepared a page especially for you.

    Q: Can you touch on upcoming filtering of keys prior to map reduce? Will it essentially replace the need for one to explicitly name the bucket/key in a M/R job? Does it require a bucket list-keys operation?

    Key filters, in the upcoming 0.14 release, will allow you to logically select a population of keys from a bucket before running them through MapReduce. This will be faster than a full-bucket map since it only loads the objects you're really interested in (the ones that pass the filter). It's a great way to make use of meaningful keys that have structure to them. So yes, it does require an list-keys operation, but doesn't replace the need to be explicit about which keys to select; there are still many useful queries that can be done when the keys are known ahead of time.

    For more information on key-filters, see Kevin's presentation on the upcoming MapReduce enhancements.

    Q: How can you validate that you've reached a good/valid KV model when migrating a relational model?

    The best way is to try out some models. The thing about schema design for Riak that turns your process on its head is that you design for optimizing queries, not for optimizing the data model. If your queries are efficient (single-key lookup as much as possible), you've probably reached a good model, but also weigh things like payload size, cost of updating, and difficulty manipulating the data in your application. If your design makes it substantially harder to build your application than a relational design, Riak may not be the right fit.

    Q: Are there any "gotchas" when thinking of a bucket as we are used to thinking of a table?

    Like tables, buckets can be used to group similar data together. However, buckets don't automatically enforce data structure (columns with specified types, referential integrity) like relational tables do; that part is still up to your application. You can, however, add precommit hooks to buckets to perform any data validation that your application shouldn't handle.

    Q: How would you create a 'manual index' in Riak? Doesn't that need to always find unique keys?

    One basic way to structure a manually-created index in Riak is to have a bucket specifically for the index. Keys in this bucket correspond to the exact value you are indexing (for fuzzy or incomplete values, use Riak Search). The objects stored at those keys have links or lists of keys that refer to the original object(s). Then you can find the original simply by following the link or using MapReduce to extract and find the related keys.

    The example I gave in the webinar Q&A was indexing users by email. To create the index, I would use a bucket named users_by_email. If I wanted to lookup my own user object by email, I'd try to fetch the object at users_by_email/sean@basho.com, then follow the link in it (something like </riak/users/237438-28374384-128>; riaktag="indexed") to find the actual data.

    Whether those index values need to be unique is up to your application to design and enforce. For example, the index could be storing links to blog posts that have specific tags, in which case the index need not be unique.

    To create the index, you'll either have to perform multiple writes from your application (one for the data, one for the index), or add a commit hook to create and modify it for you.

    Q: Can you compare/contrast buckets w/ Cassandra column families?

    Cassandra has a very different data model from Riak, and you'll want to consult with their experts to get a second opinion, but here's what I know. Column families are a way to group related columns together that you will always want to retrieve together, and is something that you design up-front (it requires restarting the cluster for changes to take effect). It's the closest thing to a relational table that Cassandra has.

    Although you do use buckets to group similar data items, in contrast, Riak's buckets:

    1. Don't understand or enforce any internal structure of the values,
    2. Don't need to be created or designed ahead of time, but pop into existence when you first use them, and
    3. Don't require a restart to be used.

    Q: How would part sharing be achieved? (this is a reference to the example given in the webinar, Radiant CMS)

    Radiant shares content parts only when specified by the template language, and always by inheritance from ancestor pages. So if the layout contained <r:content part="sidebar" inherit="true" />, then if the currently rendering page doesn't have that content part, it will look up the hierarchy until it finds it. This is one example of why it's so important to have an efficient way to traverse the site hierarchy, and why I presented so many options.

    Q: What is the max number of links an object can have for Link Walking?

    There's no cut-and-dry answer for this. Theoretically, you are limited only by storage space (disk and RAM) and the ability to retrieve the object from the desired interface. In a practical sense this means that the default HTTP interface limits you to around 100,000 links on a single object (based on previous discussions of the limits of HTTP packets and header lengths). Still, this is not going to be reasonable to deal with in your application. In some applications we've seen links on the order of hundreds per object negatively impact link-walking performance. If you need to have that many, you'll be better off exploring other designs.

    Again, thanks for attending! Look for our next webinar coming in about month.

    Sean, Developer Advocate

  1. 1
  2. Next ›
  3. Last »