What do these three things have in common?
They are frequently used together. Is that a good idea though? No, not at all, and that’s what I’m going to convince you of in this blog post.
But first lets talk little bit about what an ORM is, what it’s for, and what is lazy loading in the context of ORMs.
ORM is an acronym for “Object Relational Mapper”. Two popular examples of ORMs are Entity Framework and Hibernate.
In relational databases (like Sql Server, MySql, etc) the data is represented using tables, columns and rows. There is also the concept of a constraint like primary and foreign keys, indexes and many other things that have no direct equivalent in an object oriented programming language.
Before ORMs were popular if you needed to read data from a database you would write a SQL query and that SQL code would live somewhere together with your source code.
ORMs changed that by providing us with a way to declaratively specify how the data in a database maps to object oriented constructs. For example, how a table maps to a class and how a column maps to property.
By having this intermediate layer it was now possible to write code that looks like it’s only manipulating objects, but that under the covers, is converted to SQL queries that are transparently sent to the database.
This is great because it simplifies how we interact with data. Previously we’d have to write SQL statements, which effectively were strings in our code. Concatenating SQL with user input was very common, and that is basically what enables SQL injection. SQL injection happens when a user writes SQL statements instead of normal input in a textbox that end up being executed in your database.
So that is roughly what an ORM is about.
What about lazy loading?
To really understand lazy loading we need to talk a little bit more about a concept in ORMs called Context (or Session). Whenever data is loaded and converted to objects in memory by an ORM and these objects are stored in a Context.
Usually an interaction that involves fetching data from the database starts by creating a Context and then performing the operation that triggers the fetching of the data. Here’s an example using Entity Framework that fetches all the Costumers from the database:
var myContext = new MyContext();
var allCostumers = myContext.Customers.ToList();
//...
After accessing the Customers property in the context and calling .ToList() on it (which is the operation that triggers the fetching of the data from the database), the customer data is stored in the context itself (for example in Entity Framework you would be able to access it in myContext.Customers.Local). This is so that if you make changes to a customer the ORM can figure out what changed and generate the appropriate SQL statements.
The context also has the ability to give you a slightly altered version of a Customer. And this is particularly relevant when Lazy Loading is enabled. Imagine the Customer table is represented in the database as having many orders. The corresponding Customer class could look like this:
public class Customer
{
public int CustomerId {get; set;}
//...
public virtual ICollection<Order> Orders { get; set; }
}
Notice the virtual keyword in the Orders’ collection. It just means that you can create a subclass of customer and override what that property does. And that is precisely what the context will do if you enable lazy loading.
With lazy loading enabled the context will keep track if that property was accessed or not. When it is accessed for the first time the context will transparently fetch the associated data (in this case the customer’s orders).
This allows us to write code like this:
var myContext = new MyContext();
var johnDoe = myContext.Customers.Single(customer => customer.Name == "John Doe");
foreach(var order in johnDoe.Orders)
{
//do something with John Doe's order
}
This all looks very well. It is easy to read and understand, however it could be made more efficient. However, it is not obvious why unless you are familiar with how the particular ORM that you are using works.
In this case this is Entity Framework where the data will be fetched on calling .Single(…) and when enumerating over the Orders. So in this example there are two database calls.
It is very easy to make this much worse and create what is called the N+1 problem. Here’s an example:
var myContext = new MyContext();
var thisYearCustomers = myContext.Customers.Where(customer => customer.JoinYear == 2017);
foreach(var customer in thisYearsCustomers)
{
foreach(var order in customer.Orders)
{
//do something with the customer's order
}
}
This will trigger a database request when the the customers are enumerated (foreach over customers) and then for each customer’s orders. Hence the N+1, or 1+N if you prefer.
In case you’re thinking this that lazy loading is terrible and never makes sense, in this situation you’re probably right. A situation that leads to a N+1 problem is likely always a mistake. However that doesn’t mean that lazy loading is never useful.
Imagine a desktop application where the context is created and lives through several user interactions. For example the user opens a customer screen, looks at it and then decides to look at that customer’s orders.
In this scenario lazy loading is very convenient and makes sense.
When it does not is in a web application. That’s because the context will not exist during more than one user interaction. It’s just not possible.
In a web application the user’s actions result in an HTTP request being sent from the user’s browser to the server. The server then does all the required processing for that request and sends a response back to the user. And then this process repeats for every user action.
Between requests the server forgets about the user, so if a context is created in response to a user’s action it will be gone after the response is sent. That is just the stateless nature of the web.
The only thing you can achieve using lazy loading in a web application is extra database calls you could avoid. That’s because if user asks for the orders of a particular customer, the code that runs in the server will have to load the customer and the orders all during the handling of the user’s request, it can either do it in a single database call or two. Lazy loading just makes it really easy to end up in the two database call scenario without realizing it.
For completeness I’ll just mention what happens if you have lazy loading disabled in most ORMs. In the example above the Orders property would be null. You would have to instruct the ORM to fetch it together with the Customers all in one go. This is called eager loading:
var myContext = new MyContext();
var johnDoe = myContext.Customers.Include(customer => customer.Orders).Single(customer => customer.Name == "John Doe");
foreach(var order in johnDoe.Orders) //no extra db access
{
//do something with John Doe's order
}
There’s nothing to be gained by using lazy loading in web applications, however it’s so much more common to see it being used in “the wild” than not. Even if we use the argument that it might be more convenient at times, the likelihood of having serious performance problems (like the N+1) offsets any possible benefits.