skip to Main Content

Modern Serverless Development Part 2 – Data Repository with Cosmos DB and Azure Storage

This is Part Two of an eight-part blog series looking into a serverless application built using Azure components. The first part of the series is available here: Part One 

The first decision we made with the new application was to go with a Document Database and for performance and scalability reasons, Azure Cosmos DB was the natural choice. The application would have few related objects and so lent itself to the Document DB model although it would not be likely to the scale to the size where it was a no-brainer.

What is Azure Cosmos DB?

Azure Cosmos DB is a “globally distributed, multi-model database service” that is highly performant and designed to work for global scaling – you can read a great summary from Microsoft here. The benefits of document databases (or NoSQL) are outline well by MongoDB, in that it allows for greater scaling and more flexibility in the structure for your model. For our case it provided ease in creating a model in C# that could be directly replicated in the database without an additional layer such as the Entity Framework while still being performant. There are many considerations of when to use NoSQL versus a relational database that I am not planning to cover in this post as it assumes that you have made the decision to be NoSQL and want to know more about Cosmos DB instead.

Blog | Data Repository with Cosmos DB and Azure Storage
Credit: Shutterstock/Blackboard

If you already have a NoSQL database (such as MongoDB), one aspect of Cosmos that is particularly enticing is that you can use multiple interfaces; so if you have an existing Mongo DB you can retain the same data access API when migrated to Cosmos. If you are more familiar with SQL, then there is a SQL interface so that you can write familiar queries, slightly confusing the NoSQL notion! However, we found this a little distracting as it is best not to treat Cosmos entirely as you would a SQL database – more on this later.

One thing to make clear is that while Cosmos DB is a document database, it is not designed to hold files directly. It is possible to store attachments in the database, but we encountered some challenges with this, outlined in the Challenges section below.

How data is stored in Cosmos depends on the interface that you are using but they are usually JSON documents. It also allows key-value pairs with the Table API or graph format using Gremlin (allowing developers to build connected objects easily). An online data explorer fills the role of SQL Management Studio to allow you to query data and set options on the database such as the scale and indexing as well as writing Stored Procedures in Javascript.

How is Azure Cosmos DB priced?

As with many things in Azure, it is not a simple, straightforward answer and changes often. For the latest prices and details, see the Azure Cosmos DB Pricing page but I will summarise the key elements.

Storage

The simplest pricing is for the storage used itself. Each Cosmos document held will use an amount of storage and will incur a cost per GB.

Request Units

The more complex area is the charge for handling the data not just storing it. Azure uses Request Units (or RUs) which consider the CPU, Memory and IO utilisation of making read and write requests to the database. Each call to the database will incur a certain amount of RU usage and this is shown in the response for each call. Larger objects will use a higher number of RUs as will more complex queries.

Each collection has a maximum number of RUs per second and this therefore caps the cost at any point. However, if you consistently hit the limit, requests to the database will start to fail. You can estimate the number of RUs that your application will require using the RU calculator.

Global Scaling

If you enable multiple regions for your containers to improve global performance, you will be charged based on the storage consumed for each region and the data transfer between each region in addition to the two items above.

The benefit of these more complex pricing strategies is that you can design your application to be more cost efficient and scale up or down as required with no downtime. To determine the overall cost of your application, the best way is still trial and error as well as testing with known sets of users.

How do you develop with Cosmos?

Databases can be easily created in Azure with a few clicks (see the getting started links in the introduction above) but there is a cost for each database and even collection within a database (a collection being a set of documents). Therefore, for all our development, we made use of the local Azure Cosmos DB Emulator. This created a service on your desktop that could be queried in the same way as the Cloud version but at no cost. By default, it runs on port 8081 and creates a local version of the data explorer to be able to query your data. It also requires the Azure Storage Emulator to be installed and running to host the jobs that run.

There is no schema inherent to Cosmos itself but we used C# objects to map out our objects which then use a repository pattern originated from the sample application at https://docs.microsoft.com/en-us/azure/cosmos-db/sql-api-dotnet-application. This meant that we focussed on our application architecture entirely without having to constantly consider any differences between the database and the application itself for performance. This would be different in an application with many connected objects as it would introduce problems with pulling linked items and hamper performance.

The local emulator has its own connection string that is the same for all local instances (so no complaints of “well it worked on my machine”). Once you are ready to deploy something, you deploy your code to whichever platform it will be on (e.g. Azure App Service or Azure Functions) and update the connection string. The repository pattern above handles the creation of the database and collection required if it finds that it does not exist. The Cosmos DB instance can also be created using Azure Resource Manager which we will be covering in a later post.

For populating initial data, we built code that set up key values that should always exist that also checks to see whether they exist. This ensures that they work in all environments. Another independent function had a set of test data that would be used for automated testing. We found this also helped in development and it was often easier to drop all the data and recreate when working with many changes. This is not so easy in a live environment but with many changes in development, this became very useful.

So there are no relations between objects at all?

There are virtually no applications without any relations at all between different objects and ours was no exception. So how does Cosmos handle this? There are two key patterns to do this:

Handle relations in your application data layer

Your application can make separate calls to the database for each set of objects, filtering that set by a key. For example, the parent object may be an office:

{

“name”:”Head office”,

“town”:”East Grinstead”,

“id”:1

}

And then the child object of employees would hold the ID to that object:

[{
                “name”:”Kevin McDonnell”,
                “id”:1,
                “role”:”Senior Technical Architect”,
                “officeId”: 1
},

{
                “name”:”Geoff Ballard”,
                “id”:2,
                “role”:”Chief Technical Officer”,
                “officeId”:1
},

{
                “name”:”Andrew Chalmers”,
                “id”:3,
                “role”:”Managing Director”,
                “officeId”:1
}

When you return the office, you make a separate call to return employees filtered by the office ID. This works for smaller scenarios but when there are a lot of different relationships, this can require many different calls.

Create child objects in the document and use Triggers to keep up to date

The option that we chose was to copy any required details in the child object in to the same document and then utilise Triggers to update and of the details. This stores the objects exactly as you need to use them in your application so lightens the data layer logic further but also ensures that data is correct. Only the properties that are required are saved to reduce unnecessary storage.

Following the example above:

{
                “name”:”Head office”,
                “town”:”East Grinstead”,
                “id”:1,

                “employees”:[
                   {
                “name”:”Kevin McDonnell”,
                “id”:1
                   },
                   {
                “name”:”Geoff Ballard”,
                “id”:2
                   },
                   {
                “name”:”Andrew Chalmers”,
                “id”:3
                   }
           ]
    }

There is no need for the role in this scenario, so it is not included. If the name of an employee changes, then a trigger is fired that will update the name in any office with that employee. Triggers are also written in JavaScript and you can read more on setting them up in the Microsoft Documentation. In our application, we actually set up the trigger in an Azure Function so that we had consistency for all the logic in one place – this will be covered in the next blog post.

Challenges

One of the first challenges we encountered is that even though Cosmos DB is available as a REST interface, you cannot connect directly to it from web applications as it requires a key to be passed that would be clearly visible in your client-side code. Therefore, you will need some form of server-side code to connect to Cosmos which in our case was Azure Functions. This was far from a show stopper, but it would be great to allow OAuth access to the REST interface directly.

While you can write SQL queries, do not get excited as there are no joins between objects. The documentation caused some confusion when it mentioned Joins but it does outline that this is purely for self joins where you can return specific deeply nested child objects – very useful but not what SQL developers would expect!

There are also only a limited set of LINQ queries that work (listed at here) and this can cause some difficulties. For example, if you want to return a set of objects that have a child collection of ActualDocuments but you only want the objects where those Documents have an expiry date passed, you have to build a query like below:

query = query.SelectMany(ap => ap.ActualDocuments

                                    .Where(w => w.ExpiryDate <= System.DateTime.Now)

                                    .Select(w => ap));,

 

The key part being the final select which ensures that the parent object is returned and not the documents.

For date queries, you must ensure that you have an indexed query set for the date range. This only needs to be done once and we built this into the repository pattern when the collection is initiated.

You can also store files in Cosmos DB as attachments which can be a convenient way of having all the content in one place. However, this is not something we would recommend using to any large degree as searching for the files themselves was impossible without code (they would not appear in the data explorer at all) and there is also a limit of 2GB for each instance of Cosmos DB. Instead, we decided to store attachments in Azure Blob Storage and reference the link within the document object itself.

There is also a 10GB limit per partition so you need to ensure that you scale your data correctly. The limit for an individual document is 2MB (https://docs.microsoft.com/en-gb/azure/cosmos-db/sql-api-resources#documents) but that is a lot of JSON and not something we have hit despite having many child objects.

Summary

Azure Cosmos DB is lauded for its global scale but the benefits of the performance and nature as a Document DB make it ideal for a range of solutions. Microsoft has done plenty of work to make working in a development environment easier and supported developers who are used to Mongo DB and Cassandra as well, giving a natural path to easily migrate to the Azure Cloud. It will not suit every need as plenty of applications will work better with a relational database (with the Azure SQL Database available for these), but it has offered a relatively pain-free approach with scalable benefits.

By Kevin McDonnell, Senior Technical Architect at Ballard Chalmers

UPDATE: The next in the series is available here: Modern Serverless Development Part 3 – Business Logic with Azure Functions


About the author

Kevin McDonnell is a respected Senior Technical Architect at Ballard Chalmers. With a Master of Engineering (MEng), Engineering Science degree from the University of Oxford he specialises in .NET & Azure development and has a broad understanding of the wider Microsoft stack. He listens to what clients are looking to achieve and helps identify the best platform and solution to deliver on that. Kevin regularly blogs on Digital Workplace topics and is a regular contributor to the monthly #CollabTalk discussions on Twitter.

 


 

Post Terms: custom software uk | enterprise application | serverless

About the Author

Our technical team contribute with blogs from their respective specialities, be that Azure, SQL, BizTalk, SharePoint, Xamarin and more. From the lead architect to developers and testers, each person provides content straight from their experience.

Back To Top
Contact us for a chat