Gareth Lennox

SQL Compact 3.5 Performance and NHibernate

We’re using SQL Compact 3.5 in a smart client desktop application, talking to it with NHibernate and are experiencing performance problems in a couple of places.

In general, we’re very happy with the database, it is most of the time pretty fast, and easy to work with. The fact that it runs within our process also helps, as its easy to install and use. This is especially useful, as we are going to be running on relatively low-spec machines (laptops with 500MB ram).

We are however seeing a performance problem in SELECT statements with multiple joins (> 5).

This article warns:

In general, if a significant number of your queries require joins of more than five or six tables, you should consider denormalization.

We are seeing queries that take around 4 minutes, whereas running the same query on a proper sql server takes less than a second.

When dumping out the query plans, and reviewing it, we are seeing in most cases, a full table scan taking place, and joining of all rows before filtering. Running the same query on the full sql server (with the same schema), it uses the indexes, and filters as expected. I’m going to demonstrate this issue, and a possible solution to the problem (tests were done using SQL Compact 3.5 SP1).

I’ve created a simple schema, as defined in the diagram below (should be pretty familiar):

table diagram

(all tables had 1000 records in them, with the exception of the order_line table, which had ~16 000 records)

I’m going to use the following query, which is a bit contrived, but it is complex enough to display the issue:

SELECT   product.name,
         COUNT([order].id) AS order_count
FROM     order_line
         LEFT OUTER JOIN [order]
           ON [order].id = order_line.order_id
         LEFT OUTER JOIN product
           ON product.id = order_line.product_id
WHERE    [order].id IN ('{list of ~100 integers}')
         AND product.quantity_in_stock < order_line.quantity
GROUP BY product.name
ORDER BY order_count DESC,
         product.name

In SQL Compact, this generates the following query plan (click the image for the full size):

query plan

On the right of the query plan, you can see it joining the order and order_line tables (takes ~97% of the time). The problem is that it does this join before it filters out the orders, thereby reading all the records in the order_line table (which is obviously big!). This query runes noticeably slower with more data. Running the query on the full SQL Server results in it using the indexes as expected.

The solution to this, is to manually optimize the query yourself, instead of relying on the database engine to do it for you, by removing the LEFT JOINs and inverting the query, using INNER JOINs rather, to force the engine to filter rows out as early as possible (you’d usually write the query like this anyway, if you were doing it by hand, but remember, we’re using NHibernate to generate the query).

Here is the above query, optimized:

SELECT   product.name,
         COUNT([order].id) AS order_count
FROM     [order]
         INNER JOIN order_line
           ON order_line.order_id = [order].id
         INNER JOIN product
           ON product.id = order_line.product_id
WHERE    [order].id IN ('{list of ~100 integers}')
         AND product.quantity_in_stock < order_line.quantity
GROUP BY product.name
ORDER BY order_count DESC,
         product.name

and the corresponding query plan:

optimized query plan

If you examine the query plan, you’ll notice that it’s filtering the rows up front, rather than joining and then filtering.

So, in conclusion, it doesn’t seem that SQL Compact optimizes LEFT JOINs properly, and you have to optimize the query yourself to avoid table scans and poor performance.

One important fact to note is that we’re using NHibernate to perform the querying, and its generating sql with lots of LEFT JOINs in it (nothing wrong with that, most databases optimize this quite well). The number of joins are determined by what base object you choose to query on, and how many values from other objects you’re pulling back.

We’ve since replaced the generated queries that show this performance problem with hard-coded named queries, written with inner joins as described above.

Comments are closed.