Is docker file system slow?

Let’s measure how fast file operations are performed in docker:

The diskspd.exe Microsoft tool will generate load:

  • Operate with 2GB file size
  • A few seconds for warming-up and slowing-down
  • Block size of 128K
  • 4 threads
  • Random reads
  • Run test for 2 minutes

diskspd.exe -b128K -d120 -o16 -t4 -r -c2G -w60 -W2 -C2 filepath

No other resource-consuming activities were performed in the system during test.

Test specs

Docker Desktop v.2.2:

Tested docker version

The host Windows version is 1909:

Even though same hardware used inside/outside container, I’ll mention year-old SSD & 8 core CPU hardware used in benchmark.

Results

Surprisingly, local file system is 10 times faster than one sitting inside container:

Results, summarized

Disclaimer: At the moment of the article initial publish, I do not have a solid technical explanation why that happens. Detailed results could be found below.

Baseline: local file system

Outside container, local results

Inside container: exposed volume

Exposing volume outside

Inside container: do not expose volume

Do not expose file system outside container

Performance crime: hidden culture-specific parsing

Paying respect to user culture while consuming user input is totally correct and legal. While hiding the fact of value being parsed with specific user culture may cost a dime.

How expensive that could be?

Make a few predictions to begin with:

  • What is the most time-consuming Sitecore hook?
  • How much time to get initialized might it take?
  • What is the most time-consuming operation in that hook?

Did you predict ~ 3/4 second spent just to transform 00.00:05:00 to TimeSpan?

Flame graph scoped to HookManager callees

Sitecore.Configuration.Settings.GetTimeSpanSetting(string name, string defaultValue)would secretly parse defaultValue using context user culture that is stored in user profile and quite expensive to get on first run:

3/4 second is spent on fetching UserProfile culture

The neighboring API uses CultureInfo.InvariantCulture for reading config-based value which perfectly makes sense for the task:

Only a few consumers rely on the performance criminal ParseTimeSpan API in 9.3:

Summary

Should you need to introduce parse method that pays respect to user culture, please make dependency explicit. Otherwise your intent might be misused with a price tag.

In our example 3/4 seconds of startup time could be saved by:

  • Having default value set as TimeSpan.FromMinutes(5)
  • Document DateUtil.ParseTimeSpan to use context user culture

Reviewing docker performance profile

We have profiled application start in the previous post, and will carry on wall clock time investigation; let’s start by looking at Thread Time report:

Selecting the Thread Time report

First thing to notice – a set of processes running next to w3wp:

  • CSC is a compilation effort, every entry takes ~4.6 seconds at least (delays application start as well)
  • Robocopy util that deals with files sync
Processes inside container

Selecting a flame graph for w3wp seems quite sparse so far with little of use:

Initial view of the w3wp

That happens as all threads are displayed now without filtering. Let’s include only application_start -related entries by expanding IncPats filter:

Showing only threads that have Application_Start in call stack

Let’s hide/fold tiny bits that are <4% of time to get a high-level understanding:

See nodes with at least 4% of overall time taken

Folding method wrappers to get a cleaner output:

  • dynamicclass – a runtime wrapper around Sitecore config processor
  • CorePipeline – a runtime wrapper around Sitecore pipelines
  • Reflection – .NET reflection to invoke methods/props
Cleaner output with lower flames

The top-right method belongs to Sitecore.CES, let’s focus on it by changing IncPats:

Three threads mention Sitecore.CES

There are 3 threads on the picture (the second frame from bottom divided into three):

  • 2796: Sleeps all the time, does not add any value; can be skipped
  • 2780: Application_Start thread is blocked until initialization is done
  • 1612: Performs initialization

Removing 2796 from view by adding it to ExcPats:

How CES blocks Sitecore startup

It seem that Sitecore attempts to load device database to file system whenever container is created, which takes ~23 seconds of startup time.

The solution is simple – persist device detection folder via docker.compose:

  - .\data\cm\DeviceDetection:C:\inetpub\wwwroot\App_Data\DeviceDetection

Secondly, the DeviceDetection initialization wait timeout could be reduced to a single second via setting change in a local/dev sandbox:

Setting to control how long to wait for device detection database to be loaded

Peeling off already resolved

After Sitecore.CES impact on startup was eliminated, further application_Start investigation (excluding Sitecore.CES from view via ExcPats) could be carried on:

Remaining culprits

Further steps

Over 87% time in await likely indicates some room for improvement:

Almost 90% of time is spent awaiting

Every remaining flame bar is to be inspected (in a similar manner as we did just now) to understand if there is a potential to do better.

Performance in Docker

Have you ever wondered how to make system run faster in container?

PerfView is the tool to measure performance, right. Being a UI-based application running on top of Windows Event Tracing sounds to be irrelevant for containers, or no?

How could it work?

Operating system has to virtualize the events and forward them to the ETW session in the appropriate container. Since containers and the underlying host share a single kernel, the container’s base image OS version must match that of the host. If the versions are different, the container may start, but full functionally isn’t guaranteed:

Local /Host Windows version
Windows version running inside container

Step one: Copy PerfView into container

Copy PerfView files via volume mechanism; I’ll re-use logs folder for sake of speed:

Already existing volume to output log files
The content of the PerfView folder on host machine

Step two: Connect to container with cmd

List running containers via docker container ls:

Launching command line inside container

And connect to the Sitecore instance to profile (CM in our case) via docker exec -i -t [container_id] cmd command.

Switch to exposed folder and verify PerfView is inside container:

Inspecting the content inside container

Step three: Trigger collection

Prepare the scenario you want to profile, and trigger profiling via command:

PerfView.exe /AcceptEULA /threadTime /logFile=log.txt /LowPriority /maxCollectSec=80 collect -Providers:AFF081FE-0247-4275-9C4E-021F3DC1DA35,DD5EF90A-6398-47A4-AD34-4DCECDEF795F,7B6BC78C-898B-4170-BBF8-1A469EA43FC5

ETW providers installed inside container: logman query providers

Listing existing providers

Step four: Is collection running?

Even though it might look like nothing has happened upon invoking the PerfView, it is collecting stuff that is confirmed by log.txt:

PerfView logging started at 9/2/2020 3:55:31 PM
[EXECUTING: PerfView /AcceptEULA /threadTime /logFile=log.txt /LowPriority /maxCollectSec=80 collect -Providers:AFF081FE-0247-4275-9C4E-021F3DC1DA35,DD5EF90A-6398-47A4-AD34-4DCECDEF795F,7B6BC78C-898B-4170-BBF8-1A469EA43FC5]
Circular buffer size = 0, setting to 500.
[Kernel Log: C:\inetpub\wwwroot\App_Data\logs\PerfView\PerfViewData.kernel.etl]
Kernel keywords enabled: ThreadTime
[User mode Log: C:\inetpub\wwwroot\App_Data\logs\PerfView\PerfViewData.etl]
Enabling Provider:PerfViewLogger Level:Verbose Keywords:0xffffffffffffffff Stacks:0 Values: Guid:a8a71ac1-040f-54a2-07ca-00a89b5ab761
Turning on more CLR GC, JScript and ASP.NET Events.

What it was doing is launching the GUI, which you don’t see, and detaching from the current console. The profile becomes visible in PerfView folder once collection is done:

The result of the collection

Result confirmation

A quick look into flame graph shows some potential slow points to be addressed:

Sample flame graph
  • The leftmost bar is DeviceDetection costing 10 % (or ~5.5 seconds) of time
  • Middle bar is Nexus licensing check taking ~30 % (or ~16 seconds) of time
  • The first bar is Analytics subsystem start taking ~57% (or ~30 seconds) of time

Definitely, the profile needs a deeper look in a separate article.

Summary

PerfView collection inside container runs in the same manner as in old good days, except lack of user interface. The only difference is usage of command line arguments.

All you need to do is copy PerfView inside container & launch it in correct point in time (slow operation).

Performance crime: Params

You might heard params are evil. Have you ever seen the performance bill for that?

How to find out where params are used?

Memory allocation report shows all allocations, we’ll start from here:

Method with params creates an array behind the scenes, so hunting for object[]:

IL code proves sad truth – array is always created:

Even though there is a faster Concat with 4 arguments, it is not picked due to type mismatch. The similar issue takes place for Item.GetUniqueId():

How faster could it be without params?

2.7 times faster by simply calling .ToString prior combining values. Impressive, huh?

        [Benchmark]
        public void Stock()
        {
            var data = _item.InnerData;
            for (int i = 0; i < N; i++)
            {
                var clone = new Item(_id, data, database);
                var uniqueId = clone.GetUniqueId();
            }
        }

        [Benchmark]
        public void NoBoxing()
        {
            var data = _item.InnerData;
            for (int i = 0; i < N; i++)
            {
                var clone = new Item(_id, data, database);
                var uniqueId = string.Concat(clone.Database.Name, clone.ID.ToString(), clone.Language.Name, clone.Version.ToString());                
            }
        }

2.4 times less memory used (omitting justCopyItem from both runs: 27.5 vs 11.47 MB)

Conclusion

Params could make your program slower out of nothing. Luckily, there are automated solutions (like Roslyn Clr Heap Allocation Analyzer) that help detecting these kind of issues in early development stages.

Although modern compilers produce smarter code that does not involve array creation in this case, new language features (string interpolation) brings new challenges.

Reckless interlocked operations

Let’s start with a guess – what is the most CPU-heavy operation in Sitecore?

Answer: Performance counters initialization:

Yes, that’s right, those performance counters which are powered by OS and super fast.

Although Sitecore counter init implementation does not look offensive (volatile read & interlocked), counters are everywhere (cache lookup / access / object creation):

Considering server concurrently runs N threads & each updates memory == huge impact.

Benchmark time: VTune time

Intel VTune shows performance counters init code is translated into lock cmpxchg:

if (initialized != 1 && Thread.VolatileRead(ref initialized) != 1 && Interlocked.CompareExchange(ref initializingCounter, 1, 0) != 1 && AllowConnect)

Sync between all the cores & writes on each go to the memory leads to almost five times lower processor efficiency [CPI] on the surface (theoretical x5 slow down):

In simple words: Intel I7 with 8 cores performs as 2014 Intel 2-core I3.

How big is the win without interlocked?

Changing the conditions order (counters are never initialized if disabled in config) improves processor efficiency 15 times (4,6 vs 0.46):

                if (AllowConnect == false)
                {
                    return;
                }

                if (initialized != 1 && Thread.VolatileRead(ref initialized) != 1 && Interlocked.CompareExchange(ref initializingCounter, 1, 0) != 1)
                {....

No memory writes == no memory-bound bottleneck:

Conclusions

The execution time dropped 22 times (5.8 -> 0.27 sec) just by preventing reckless interlocked operations on each go.

Why should an upper limit exist for every saved bit

Case study: polluted reports shows how system can be polluted with dummy data.

Saving data (even HTTP referer) without validation can contaminate system as well:

SELECT TOP (10) 
	[ContactId]
	,[LastModified]
        ,[FacetData]
	,JSON_QUERY(FacetData,'$.Referrers') as [Referrers]
	, DATALENGTH(JSON_QUERY(FacetData,'$.Referrers')) as [ReferrerSize]
  FROM 
	[xdb_collection].[ContactFacets]
  WHERE 
	[FacetKey]='InteractionsCache'
	AND CHARINDEX('"Referrers":["', FacetData) > 0
  ORDER BY [ReferrerSize] DESC

The results show astonishing 28KB for storing single value:

Next time you see Analytics shards worth 600 GB – recall this post.

Can configuration be trusted?

Configuration can go nuts: CMS 8.1 setup needs 200+ files to be edited by hand:

Neither connection strings, nor custom features are in included yet; things get more complicated in reality.

Lifebuoy: configuration roles

A few enthusiasts simplified configuration task solely to in 9+ series:

<settings role:require="Standalone" environment:require="Production">    
    <setting name="X" set:value="Y"/>
</settings>

Next era: containers and configuration

Docker eliminates the need in 14+ steps to run the Sitecore. All you have to do is to cook an image of your changes on top of vendor image (like layered cake) & play it anywhere.

Dilemma: the same image in different environments = same files; but still have different connection strings / features …. How?

Environment variables

Latest Sitecore is capable of fetching configuration values from environment variables:

Amazingly, there is no way out-of-the-box to see running values!

Showconfig shows only sitecore node, while role:define, search:define live inside web.config. Moreover, web.config will have an outdated value when environment variable was picked.

Baking solution to see running values

Let’s bake the page to show all roles/definitions, including custom ones:

  • Sitecore role (Standalone, CD, CM..)
  • Indexing engine used (SOLR, AzureSearch..)
  • Any custom layer (f.e. security:define or myproject.environment:define)
    [AllowDependencyInjection]
    public class ShowRoles : AdminPage
    {
        private readonly IConfigurationRulesContext _configRulesContext;

        public ShowRoles() { /* Dummy to make ASP.NET happy. */ }

        public ShowRoles(IConfigurationRulesContext configRulesContext) => _configRulesContext = configRulesContext;

        protected void Page_Load(object sender, EventArgs e)
        {
            CheckSecurity();
            Response.Write("<html><body>");
                Response.Write("<h3>Sitecore rule definitions</h3>");
                var names = _configRulesContext.GetRuleDefinitionNames();
                foreach (var name in names)
                {
                    var values = _configRulesContext.GetRuleDefinitionValue(name);
                    Write(name, values);
                }
        }    

        private void Write(string group, IReadOnlyCollection<string> elements)
        {
            Response.Write($"<h4>{group}</h4>");
            elements = (elements ?? Array.Empty<string>()).Where(element => !string.IsNullOrWhiteSpace(element))
                .ToArray();
            if (elements.Any())
            {
                foreach (var element in elements)
                {
                    Response.Write($"<p>{element}</p>");
                }
            }
            else
            {
                Response.Write("<p>No elements in the group.</p>");
            }
        }

Demo compose file looks as:

environment:
  SITECORE_APPSETTINGS_ROLE:DEFINE: Standalone
  SITECORE_APPSETTINGS_SEARCH:DEFINE: Solr
  SITECORE_APPSETTINGS_MYPROJECT.ENVIRONMENT:DEFINE: Development

Gotcha: Not all settings were applied

myproject.environment is not applied; environment-specific config is always ON:

Container variables are there, though:

Are environment variables visible to Sitecore process?

                Response.Write("<h3>Environment variables</h3>");
                var environmentVariables = Environment.GetEnvironmentVariables();
                foreach (string variable in environmentVariables.Keys)
                {
                    var values = new[] { environmentVariables[variable] as string };
                    Write(variable, values);
                }

Yes, the missing setting is exposed as environment variable:

How come value is not picked by Sitecore?

Short answer – that is life. Adding the dummy into web.config makes variable be picked:

Adding a key with dummy value
Using file explorer to upload a modified web.config
Page output shows custom:define now

The current implementation requires app:key to present in config to be replaced in runtime. And yes, there is no way to check if it is picked 😉

Summary

A lack of configuration assemble traceability leads to huge investigation effort when things do not work well. File-based configuration can no longer be trusted in containerized deployments; at the same time there is no source of truth to verify its correctness.

How often the site is visited from specific place?

Can “site visit frequency from specific place (or better, certain company office)” be just a query away? The needed analytics data is already collected by Sitecore, hence data mining could roughly be:

  1. Figure out area postal code (or reverse it by IP using any reverse IP lookup)
  2. Find all contacts that have the same details postal code (or by other field criteria)
  3. Locate visits done by the contacts
  4. Aggregate the number of pages in each visit to understand the length of their journey

The IP address in our demo belongs to Dnipro with 49000 postal code. It is recorded by Sitecore Analytics in following manner:

The GeoIP data is a part of InteractionsCache facet that belongs to contact; we could find all the contacts from postal code/city/(any condition from picture above) by query:

DECLARE @location NVARCHAR(20) = '49000';
DECLARE @LocationUtcShift INT = 2; 
DECLARE @MinInteractionsThreshold INT = 6;
DECLARE @ShardsCount INT = 2;

WITH [ContactsInTheArea] AS(
	SELECT 
		DISTINCT(cf.ContactId) AS [ContactId]
	FROM [xdb_collection].[ContactFacets] cf
		CROSS APPLY OPENJSON([FacetData], '$.PostalCodes')
			WITH ([City] NVARCHAR(100) '$')
		WHERE
	FacetKey='InteractionsCache'
	AND ISJSON(FacetData) = 1 
	AND [City] = @location)
SELECT COUNT(1) AS [Unique browser sessions] FROM [ContactsInTheArea]

The next step is to locate all the interactions recorded in system:

[InteractionsFromTheArea] AS(
SELECT 
	i.InteractionId,
	DATEADD (HOUR, @LocationUtcShift, i.StartDateTime) AS [StartDateTime],
	DATEADD (HOUR, @LocationUtcShift, i.EndDateTime) AS [EndDateTime],
	i.[Events],
	Pages = (
		SELECT COUNT(1) 
			FROM OPENJSON([Events])
				WITH ([Event] NVARCHAR(100) '$."@odata.type"') 
			WHERE [Event] = '#Sitecore.XConnect.Collection.Model.PageViewEvent')
	FROM [xdb_collection].Interactions i
		INNER JOIN [ContactsInTheArea] d 
		ON d.[ContactId] = i.ContactId)
SELECT * FROM [InteractionsFromTheArea]

We found all the recorded interactions performed from the location we originally set. The last step is to aggregate statistics per day:

SELECT 
	CAST (i.StartDateTime AS DATE) AS [Session Time],
	COUNT(1) AS [Test Sessions],	
	CAST(ROUND(AVG(CAST(Pages AS FLOAT)), 2) AS NUMERIC(36,2)) AS [Avg. pages viewed]
FROM [InteractionsFromTheArea] i
	GROUP BY CAST (i.StartDateTime AS DATE)
	HAVING COUNT(1) > (@MinInteractionsThreshold / @ShardsCount)
	ORDER BY [Session Time] DESC

The last query answers how often our site was visited in the area that belongs to the postal code/(company name owing the IP address):

Summary

A daily statistics of interactions (and their quality) originated from area is a query away, impressive? Since we operated on one shard out of N, the results are to be multiplied by N to get complete picture.

The report is built by burning CPU to parse raw JSON on each go (the more data = the more CPU spent). A lack of data normalization is a price to pay for flexibility (possibility to track/store custom info) that introduces a need of reducing/extracting/aggregating data (constantly adjust report data to reflect data change) and storing into query-friendly format.

Case study: polluted reports

Agenda

Analytics reports have suspicious statistics with lower conversion rates compared to other systems. Can we find out why?

It seem that healthy data is diluted with junk/empty interactions with no value. We assume robot/crawlers activity gets recorded. Is there any OOB protection in Sitecore?

Filter out robots by user agents

Sitecore blacklists robots via a list of user agents defined in the config:

App_Config\Sitecore\Marketing.Tracking\Sitecore.Analytics.ExcludeRobots.config

User agents that should be excluded rain or shine

Theoretically, zero interactions with these user agents should be recorded, right? Well, I do not blog about straightforward tasks. We could check the actual number of robots via huge SQL composed by replacing line break character with ',':

But still, 20% of all contacts (that have interactions) report to have robot user agents. Could it be due to the fact Sitecore uses case-sensitive match for robots, while SQL column has case-insensitive SQL_Latin1_General_CP1_CI_AS collation? While that can easily happen in your case, a cause is different in our case study:

Let’s leave this question unanswered (for now) and focus on what robot is?

How to identify robot by behavior?

Page crawler requests pages one by one without persisting any cookies – it will not have interactions with more than one page. We could try to find user agents that do not have more than one page recorded:

WITH PagesInInteractions AS(
	SELECT Pages = (
		SELECT COUNT(1) 
			FROM OPENJSON([Events])
				WITH ([Event] NVARCHAR(100) '$."@odata.type"') 
			WHERE [Event] = '#Sitecore.XConnect.Collection.Model.PageViewEvent'),
	Created,
	LastModified,
	UserAgent,
	InteractionId,
	ContactId
FROM [xdb_collection].[Interactions])
  
SELECT 
	COUNT(1) AS Hits,
	[UserAgent], 
	DATEDIFF(DAY, MIN(Created), MAX(LastModified)) AS [DaysBetween],  
	MIN(Created) AS [Earliest],
	MAX(LastModified) AS [Latest]
FROM PagesInInteractions 
GROUP BY [UserAgent] 
HAVING 
	MAX(Pages) <=1 
	AND COUNT(1) > 500
ORDER BY COUNT(1) DESC

This query finds unique user agents that have viewed single-page only and have over 500 interactions:

20% of total interactions recorded system-wide belong to user agents that do not have over 1 page in visit across 500 visits. These user agents are most likely to be added into the blacklist to stop them from being tracked.

Could contacts without interactions exist?

Although that should not happen in theory… you got it:

SELECT COUNT(DISTINCT(c.ContactId)) FROM [xdb_collection].Contacts c
LEFT JOIN  [xdb_collection].Interactions i
ON i.ContactID = c.ContactId
WHERE i.ContactId IS NULL

Our case study has 7.5 % of contacts without interactions that was caused by bug.

Summary

The query we developed to locate suspicious user agents allows us to identify robots with better accuracy in future. Unfortunately, the previously collected robot sessions would still remain in system and pollute analytics reports. Needless to say that you pay for storing useless data to your hosting vendor.

In next articles we’ll try to remove the useless data from system to recover the reports.