In my previous post I spoke about key layers of distributed applications. Now we will go through the most crucial layer of any distributed application, the data layer. In this part, you will be introduced to various database technologies, along with .NET-related technologies.
Data can be stored in a wide range of data sources such as relational databases, files on the local filesystems, on the distributed filesystems, in a caching system, in storage located on the cloud, and in memory.
- Relational databases (SQL server): This is the traditional data source that is designed to store and retrieve data. Queries are written in languages such as T-SQL-utilized Create, Retrieve, Update, and Delete (CRUD) operations model.
- The filesystem: The filesystem is used to store and retrieve unstructured data on the local disk system in the files. One of the simplest options to store and retrieve data, it has many functional limits and is not distributed by its nature.
- The Distributed File System (DFS): The DFS is the next level of file system that solves the size and other limitations introduced by local disks. In a nutshell, DFS is a pool of networked computers that store data files.
- NoSQL databases: NoSQL databases are a new way of storing data in a non-relational fashion. Often, NoSQL databases are used to store large or very large volumes of data, and the biggest difference between these databases and relational database is that NoSQL data stores are schema-free. However, data can be organized by one or more different models, such as key-value stores and document stores, among others.
- Cloud storage: Any infrastructure located on the cloud solves many issues, such as security, reliability, resilience, and maintenance. Cloud offerings such as Microsoft Azure Storage provide many ways of storing the data in different formats, which can be structured or unstructured. As with many other cloud storage offerings, Microsoft Azure Storage exposes the HTTP REST API, used by any application and client running on any platform that supports HTTP.
- In-memory stores: In-memory stores are the fastest data stores that are limited in size, not persistent, and cumbersome to use in a distributed multi-server environment. In-memory stores are used to store temporary and volatile data.
ADO.NET and ADO.NET Entity Framework
.NET Framework has several database access options, and the foundation of most of them is ADO.NET. ADO.NET can be called a foundation for every other data access technology on Microsoft stacks. In a nutshell, ActiveX Data Objects .NET (ADO.NET) is a collection of classes that implement program interfaces to simplify the process of connecting to data stores without depending on the structure and implementation of a concrete data store and its location. The challenge that it offers is that most developers must write complex data access code (between the application and the database) that requires them to have a good understanding of the database itself, of raw tables, views, stored procedures, the database schema, table definitions and parameters, results, and so on.
This is mostly solved by the Object-relational mapping (ORM) approach. Programmers create a conceptual model of the data and write their data access code against that model, while an additional layer provides a bridge between the entity-relationship model and the actual data store. Entity Framework generates database entities according to database tables and provides the mechanism for basic CRUD operations, managing 1-to-1, 1-to-many, and many-to-many relationships, and the ability to have inheritance relationships between entities among others.
Basically, you have the ability to "talk" about your model not with the database but with the class model you wrote or generated from a database using Entity Framework. This is achieved by the creation of a combination of XML schema files, code generation, and the ADO.NET Entity Framework APIs. The schema files are used to define a conceptual layer, to be used as a map between the data store and the application. The ADO.NET Entity Framework allows you to write the application that uses classes that are generated from the conceptual schema. Entity Framework then takes care of the rest.
Another important component of Entity Framework that is often used by developers is Language Integrated Query (LINQ). It adds data querying capabilities to .NET languages and extends the language with SQL-like query expressions.
There are three approaches to working with Entity Framework in the project:
- Database-first: This approach is used when you already have a database that is going to be used as a data source.
- Model-first: This approach is used when you have no database. First, you draw the model in the Visual Designer and then instruct it to create the database for you with all the tables.
- Code-first: This approach is used often as it provides a way to write your model in code as classes and instruct Entity Framework to generate the database with objects described in the code.