Sharding with SQL Azure – Introducing Federation in SQL Azure

Scalability often mean scaling out, and sharding the DB.
Until now, the sharding has always been managed at the application level, with a lot of manual work by developers with big scalability requirements.

Most of the NOSQL dbs like MongoDB, Raven etc, already support sharding, since it’s much easier to shard a non relational database than a relational database.

Finally MS is trying to add this concept of sharding built in into Sql, specifically into SQL Azure.
A single instance of SQL Azure can be up to 50GB, I believe this limit is due to the fact that each instance is replicated in 2 other locations, so it is be quite difficult to keep in sync a single massive instance bigger than that. For this reason Azure users where invited to shard their data across multiple SQL instances, adding a lot of work  to the developers implementing the solution.

MS released a whitepaper on sharding few days ago [1]. It’s a long and interesting document about the best practices of data sharding, and some details of the implementation of SQL Azure Federations and how we’ll be able in future to do it at the database level instead of doing it at the application level.

The following schematic outlines the concepts in SQL Azure with Federations:

6131.F1.bmp-550x0

 

Cihan Biyikoglu (Program Manager at Microsoft – SQL Azure) has the most interesting blog [2] about SQL Azure and has recently posted about this new Federation concept.

He has a very useful introduction to the new “keywords” (or key concepts if you want) [3] that will be useful to understand how everything works.

  • Federations represent all data being partitioned. It defined the distribution method as well as the domain of valid values for the federation key. In the picture below, you can see that the customer_federation is part of sales_db.
  • Federation Key is the key used for partitioning the data.
  • Atomic Unit (AU) represent a single instance value of the federation key. Atomic units cannot be separated thus all rows that contain the same instance value of the federation key always stay together.
  • Federation Member (aka Shard) is the physical container for a range of atomic units.
  • Federation Root is the database that houses federations and federation directory.
  • Federated Tables refer to tables in federation members that contain partitioned data, as opposed to
  • Reference Tables refer to table that contain data that is repeated in federation members for lookup purposes.

2627.image_thumb_4451A556

It’s all very exiting to see such a technology being implemented at the database level, keeping the ACID (http://en.wikipedia.org/wiki/ACID) properties inside a single Shard, and at the same time allowing the horizontal scalability needed in most modern systems.

1 – Sharding with SQL Azure
http://social.technet.microsoft.com/wiki/contents/articles/sharding-with-sql-azure.aspx

2 – SQL Azure – Your Data in the Cloud
http://blogs.msdn.com/b/cbiyikoglu/

3 – Building Scalable Database Solution with SQL Azure – Introducing Federation in SQL Azure
http://blogs.msdn.com/b/cbiyikoglu/archive/2010/10/30/building-scalable-database-solution-in-sql-azure-introducing-federation-in-sql-azure.aspx