Using Views to avoid node_load()

Submitted by Gergely Lekli on Tue, 03/01/2011 - 5:40pm
Gergely Lekli's picture

One of the most versatile Drupal modules, Views, comes with an impressive user interface that allows you to capitalize on all its functionality – or does it? Although this interface is the primary means of using Views, the module has plenty to offer on the code level too.

Recently I was working on a task that involved selecting nodes of a certain type from a huge list of node IDs. When it comes to data manipulation on a complex web site, a web developer needs to take due precaution. An inefficient SQL query can bring the site to its knees by creating a performance bottleneck. So I looked into several ways to make the queries more efficient; Views was a good candidate.

In this case, I had a list that contained node IDs only – there was no way to join it to the node table directly to get the type in one SQL query. Therefore I had to determine the node types separately. The first thing that comes to mind is using node_load() to retrieve the node object, which contains the type property. This would be a perfectly good solution for loading a couple of nodes, but if you need to load hundreds, that could have a negative impact on performance.

Not only does node_load() run a heavy SQL query with three table joins, it also fires the nodeapi hooks in all modules. This is the hook where modules load all node related data and attach it to the node object, so this can add up to quite a few queries with just some basic modules enabled. Using the Devel module, I summarized the execution times of queries that run during a node_load() call.

The figure below shows the SQL query times for functions in commonly used modules for a single node_load() call; the values are averages of three runs. On a complex site, of course, this is a rough underestimation.

 

Function that executes the SQL query Query execution time (ms)
node_load 0.56
comment_nodeapi 0.43
drupal_lookup_path (called by path_nodeapi) 0.66
taxonomy_node_get_terms (called by taxonomy_nodeapi) 5.97
page_title_load_title (called by page_title_nodeapi) 0.34
Total 7.96

 

As one can infer, node_load() retrieves far more data than was needed for the task. In Views, on the other hand, I can select the data that I want loaded. When the view is executed, it retrieves only the data that I selected, eliminating unnecessary queries. Fortunately enough, Views offers a function, named views_get_view_result(), to get the data a view retrieves, without having to render the view. Returning an array of the selected nodes, it can act as an abstraction layer above database functions.  The following figure shows the execution times of queries run during a views_get_view_result() call – assuming the view is already cached.

 

Function that executes the SQL query Query execution time (ms)
cache_get 0.51
title_query 0.45
execute 0.26
Total 1.22

 

That’s an attractive six-fold performance increase. In terms of page load time – a pivotal point of busy sites, this gain could lead to a time-savings in the hundreds of milliseconds.

There might be occasions where you simply want to run an SQL query against the node table to retrieve only a few pieces of data. Although this indeed can be even faster than using a view, on the long run it might be worthwhile to use an abstraction layer like Views to avoid problems in case the underlying data structure changes.

UPDATE 3/2/2011:

When measuring the overall performance, however, one should expect Views to inflict some additional load on other resources. As a result, you can profit from this approach only if the database connection is the weakest point of the server.

4 comments

by Anonymous (not verified) on Tue, 03/01/2011 - 7:29pm

1.) You're comparing apples with oranges. You're comparing a cache_get with node_load. You can easily create your own caching mechanism to cache the node_load you're doing, and that would be *significantly* faster than a view. The View bootstap and overhead is WAY more than the entire node_load function

2.) You're only comparing SQL queries only. The Views module overhead is significantly more than the SQL queries it runs, not to mention the gobs of RAM it requires.

3.) Implementing a reverse proxy such as Varnish, or a poor-man's caching mechanism such as Boost will mean 0 SQL queries run.

Gergely Lekli's picture

by Gergely Lekli on Wed, 03/02/2011 - 6:52pm

Dear Anonymous,

Thank you for your note. I agree that the view entails extra burden. I should have highlighted that the post compares the SQL queries that are executed during node and view loading. I have edited it accordingly.

As for caching, in that particular case, we needed up-to-date results, and so did not wish to cache the data node_load acquired. In the second table, cache_get is the call that loads the view, and not data. Since the view is loaded from cache in real-life page loads, it might be relevant to compare the time of a cached view to that of a node_load – which does not cache to the database.

by j (not verified) on Fri, 10/21/2011 - 2:05am

I seem to be constantly running into devs who refuse to use Views because it's "not as good as coding SQL" because it's "inefficient" or "takes too long". They would rather hand-code SQL to generate a result-set, citing speed as the main reason and then iterate over the result-set with node loads because retrieving all the relevant node data in the result-set SQL is too time-consuming to code!

This is stupid. Apart from the fact that people's SQL skills differ so sometimes the result-set SQL is slow, node_loads add plenty of overhead. You can't always rely on Varnish to save you as it only applies to logged-in users, so if you have a high-volume site with thousands of users, you're screwed unless you use Views and decent internal caching.

A lot of these wheel-re-inventors think they are doing a better job than Views whilst forgetting about adding the ability to cache results. If they are really stubborn and decide to remove the node_load by making the SQL more comprehensive to retrieve CCK fields, you will then have to explain that it will break if anyone shares CCK fields across content types as the table structure is different under those circumstances.

Also, regarding the comment about Views bootstrap vs node_load; you are comparing oranges to fruit-baskets. Any overhead of views is largely relating to creating a set of results, whereas over-use of node_load is overhead on each result. If you've got a page of fifty results, you need to compare fifty node_loads to one view.

Another reason why you should use Views in a dev team is that it levels the playing field. There are plenty of good drupal devs out there with little or no SQL skills. If you use Views, it's a standard. You know what you're getting and how it behaves, regardless of individual skill. Any half-decent Drupal dev can join your team and immediately be able to work on a Views-based site, rather than picking over hundreds of lines of potentially rubbish code. If if it's good code, one dodgy join can scupper it.

I've seen some innocuous-looking custom SQL that replaced an API call cause server spikes that repeatedly knocked over a site.

by j (not verified) on Wed, 10/26/2011 - 1:55am

^^ Varnish applies to logged-out users obviously. Must re-read posts before submitting... ;-)

Post new comment