Case study: polluted reports shows how system can be polluted with dummy data.
Saving data (even HTTP referer) without validation can contaminate system as well:
SELECT TOP (10) [ContactId] ,[LastModified] ,[FacetData] ,JSON_QUERY(FacetData,'$.Referrers') as [Referrers] , DATALENGTH(JSON_QUERY(FacetData,'$.Referrers')) as [ReferrerSize] FROM [xdb_collection].[ContactFacets] WHERE [FacetKey]='InteractionsCache' AND CHARINDEX('"Referrers":["', FacetData) > 0 ORDER BY [ReferrerSize] DESC
The results show astonishing 28KB for storing single value:
Next time you see Analytics shards worth 600 GB – recall this post.