Cloud Data Viz and Analytics Health Check
Uncover the fitness of your Cloud Data Viz & Analytics
Get my free scoreHow Astrato prevents users from SQL traps
Astrato’s unique approach to SQL queries prevents common traps like fan and chasm traps, ensuring accurate data results without the need for complex ETL processes. By automatically implementing the Unified Star Schema (USS), Astrato separates different data grains into sub-queries, combining them accurately. This not only enhances data integrity but also simplifies SQL validation for users, offering a transparent and reliable BI tool that supports direct database queries. This leads to more precise business insights and better decision-making.
The core problem? Tables usually have different granularities (or grains), which leads to the risk of incorrect results.
In Astrato, we display data relationships in a straightforward left-to-right manner. The most detailed table (fact) is on the left, connecting to less detailed tables (dimensions) on the right. When multiple fact tables are involved, we use a Directed Acyclic Graph (DAG) structure to ensure everything connects correctly without loops, forming a clear, hierarchical model.
See references:
https://en.wikipedia.org/wiki/Directed_acyclic_graph
https://en.wikipedia.org/wiki/Polytree
The following data set contains 4 fact tables (Sales, Shipments, Returns, and Purchases) and 3 dimension tables (Clients, Products, and Suppliers). In terms of Graph Theory, this is a DAG Polytree. Please notice that the tree has 3 roots: Shipments, Returns, and Purchases.
When a user drags and drops fields in Astrato, it automatically creates a query. Astrato’s Data Analytics Engine (DA Engine) then checks the query for dimensions, measures, and filters. During this process, the DA Engine detects any potential SQL traps to ensure the query is accurate and reliable.
A SQL fan trap happens when you join two tables that have a many-to-one relationship and use a numeric column from the “one” side, aggregating it with functions like SUM or AVG. This leads to incorrect results due to duplication issues. Other problematic aggregations include NRTD (Non Resilient To Duplication) aggregations: COUNT (non-distinct), MEDIAN, MODE, LISTAGG, STDDEV, and VARIANCE.
Let’s illustrate a fan trap example using two tables from the data view above, where the query contains two measures and one dimension:
SALES (one):
The total amount of SOLD products is 10.
SHIPMENTS (many):
The total amount of SHIPPED products is 6.
A fan trap occurs when you join two tables with a many-to-one relationship and aggregate a numeric column from the “one” side, like using SUM on the SOLD column from the Sales table. This can lead to incorrect totals due to data duplication – this is a fan trap.
Also, please note that the 4 items of SA04 are not shipped yet.
Now the user wants to know “how much was ordered and shipped, by product”. This is a very common business question.
Let’s do the query with a SELECT statement and join the two tables to sum up the both quantities by product:
The query result shows that 5 items of PR01 were sold, and only 1 was shipped. This is true. It also shows that the products PR02 and PR03 were partially shipped. But this is NOT TRUE!
We saw earlier that the total of SOLD is 10. But now it appears to be 15. How is it possible?
I think you guessed it: the fan trap caught us!
The reason is very easy to understand by watching the image below: the sales SA02 and SA03 are duplicated.
This result is clearly incorrect.
Now, let’s ask Astrato. This is the result:
The result in Astrato is correct. How is this possible?
Because Astrato followed and implemented the principles of the Unified Star Schema (USS) invented by Francesco Puppini. (https://technicspub.com/uss/)
The USS methodology requires modeling and loading a table called the “Puppini Bridge”. Astrato went a step further and does this automatically for you, out of the box! So there is no need for any ETL in your data source!
Astrato separates the different grains into sub-queries, which then are combined by the set-operation UNION ALL before the aggregation:
Astrato also adds comments automatically to the SQL statement nicely (here in gray color) so it makes it fully transparent and readable to the user. Plus, the benefit is that the user can copy the SQL statement and can run it for validation directly in the database or in any other analytical tool. This is not possible in most of the BI tools available on the market today, because they perform part of the calculation in their internal in-memory engine. Astrato, instead, is 100% SQL!
If a query contains three or more tables joined together, this may lead to a chasm trap. Chasm traps are solved exactly the same way in Astrato: by separating the grains into sub-queries.