Case study: database optimization

Sitecore item is stored in 4 tables:

  1. Items: has item ID, name, parentId and the templateID item is based on
  2. SharedFields: has itemId, fieldId, and value itself
  3. UnversionedFields: has language for the value, itemId, fieldId, value
  4. VersionedFields: has version number, language, itemId, fieldId, value

The item data is read by a query that unions all the tables and uses ItemID condition:

A caching layer ensures SQL to be executed only in case data was not found in cache. There are 3 main scenarios to load item data:

  1. By item id: database.GetItem(ID) is called
  2. Children: GetChildren is called
  3. By template: during application start, initial items prefetch

Key points

  1. Individual fields are not selected by fieldId as all fields selected for item at once
  2. Items are commonly requested by ID (dominant workload)
  3. Query unions 4 tables via ItemID condition
  4. Query performs sort on database side
  5. None of the tables has primary key defined

How does the SQL Server execute query?

The default query execution plan highlights many steps to be taken to read one item:

Stock query execution plan has many nodes

Unfortunately, item-related tables do not have primary key defined so that every request does RID lookup. Since the volume of reads is far greater than the number of modifications in web and core databases, read workload optimization could be applied:

  1. Defining a primary key (non-unique) for fields table by itemID so that fields belonging to same item are stored next to each other
  2. Offloading sort operation from database to client code
  3. Use view to avoid sending long query
  4. Simplifying ItemID condition – moving away from where ID in SELECT
  5. Reduce the volume of SQL requests

Measuring the impact

Schema-change decision must be driven by data/statistics analysis, hence we’ll measure the outcome via SQL Server Profiler for default VS optimized versions:

  • Duplicate the tables with suggested improvements
  • Ensure SQL Indexes are healthy
  • Restart SQL Server
  • Request N items from database

Clustered VS Non-Clustered

Over 3 times faster thanks to clustered indexes:

Avoid ORDER BY

SQL Server sort can be moved into the application logic to get ~50% speed up:

Not only MemortGrantInfo is 0, but also the Estimated Subtree Cost is ~47% less:

Creating SQL view

Although view does not give any boost, it hides the impl. detail on how item data is built:

CREATE VIEW [dbo].[ItemDataView]
AS
SELECT        ItemId, [Order], Version, Language, Name, Value, TemplateID, MasterID, ParentID, Created
FROM            (SELECT        ID AS ItemId, 0 AS [Order], 0 AS Version, '' AS Language, Name, '' AS Value, TemplateID, MasterID, ParentID, Created
                          FROM            dbo.Items
                          UNION ALL
                          SELECT        ParentID AS ItemId, 1 AS [Order], 0 AS Version, '' AS Language, NULL AS Name, '' AS Expr1, NULL AS Expr2, NULL AS Expr3, ID, NULL
                          FROM            dbo.Items AS Items_Parent
                          UNION ALL
                          SELECT        ItemId, 2 AS [Order], 0 AS Version, '' AS Language, NULL AS Name, Value, FieldId, NULL AS Expr1, NULL AS Expr2, NULL
                          FROM            dbo.SharedFields
                          UNION ALL
                          SELECT        ItemId, 2 AS [Order], 0 AS Version, Language, NULL AS Name, Value, FieldId, NULL AS Expr1, NULL AS Expr2, NULL
                          FROM            dbo.UnversionedFields
                          UNION ALL
                          SELECT        ItemId, 2 AS [Order], Version, Language, NULL AS Name, Value, FieldId, NULL AS Expr1, NULL AS Expr2, NULL
                          FROM            dbo.VersionedFields) AS derivedtbl_1

Simplifying the condition to select items

The stock query would return item fields only in case item definition exists:

Query can be optimized for a mainstream scenario (item data exists) and directly stream the content of the field tables. Application may filter out rows without definitions later on:

Theoretical: Stock vs Optimized

The optimized query is 7.3 times faster than the stock:

Reduce the volume of SQL Queries

The final query streams data from tables in a fastest possible way turning request overhead (like network latency) to be top wall clock time consumer. The volume of requests could be reduced by loading not only item by ID, but also its children:

  SELECT * FROM 
	[ItemDataView] d
  JOIN 
	[Items] cond
  ON [d].ItemId = [cond].ID
  WHERE (cond.ID = @ID OR cond.ParentID=@ID)

Practice: Testing variations

We will load all the items from Sitecore database:

    var item = db.GetItem(Sitecore.ItemIDs.RootID);
    System.GC.Collect(System.GC.MaxGeneration,System.GCCollectionMode.Forced, true, true);    
    Sitecore.Caching.CacheManager.ClearAllCaches();
    var ticksBefore = Sitecore.Diagnostics.HighResTimer.GetTick();    
    var items = item.Axes.GetDescendants();        
    var msTaken = Sitecore.Diagnostics.HighResTimer.GetMillisecondsSince(ticksBefore);

Results would be measured by SQL Server Profiler and aggregated to get AVG values:

View top metrics

Test combinations

  • Stock query as a base line
  • Clustered index only
  • NS: Clustered index without sort
  • +KIDS: Clustered index without sort + loading children
  • InMemory tables for all item-related tables
  • Symbiosis: InMemory for items + clustered for fields table

Results: Over 30% speedup

Results highlight clustered indexes without sort (NS) is only 10% faster

Loading children with item itself is the winner:

  • 30% faster on a local machine; even a greater win in distributed environment
  • 18% reduce number of SQL queries
  • 25% less CPU spent
  • 35% less reads

The item fetch was improved thanks to understanding how the system operates with data, thus SQL Server can handle a bigger load with no additional cost.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: