Querying 6.35 Billion Records - a TPC-DS Iceberg Performance and Cost Comparison between Starburst Galaxy and Snowflake

Data Analytics continues to be a critical area for most modern applications and there are many solutions available, therefore it’s important for application owners to be aware of their options. Given that I focus on Amazon Web Services, I’m interested in solutions that can be launched on compute infrastructure in AWS and that access data stored in the cloud.

For this article, I focused on Data Analytics platforms Starburst Galaxy and Snowflake with compute and data storage infrastructure deployed on AWS. Similar to other articles, I will focus on areas such as performance, infrastructure setup and cost, with the following angle: “What do I need to do in order to run this workload, how fast will it be and how much will I pay for it?”

Starburst Galaxy is a managed data analytics platform built on top of the Trino query engine (formerly known as PrestoSQL). It is available on multiple cloud providers, such as AWS, GCP or Azure. It can also query data stored in a wide range of options, including these cloud providers and other data sources.
Snowflake is a managed data analytics platform that also supports launching compute infrastructure in the cloud (e.g. AWS, GCP and Azure) and a number of data sources, including storage in these mentioned cloud providers.

For the tests I executed, I focused on compute infrastructure deployed on Amazon Web Services, in the N. Virginia region and data stored in AWS S3 in the same region.

Data and Infrastructure Setup

Below are the infrastructure configuration details in order to execute these tests on both Starburst Galaxy and Snowflake.

Data

The data used in this set of tests corresponds to a 1TB TPC-DS dataset, in Apache Iceberg table format.
Files used in the Iceberg tables were stored in an S3 bucket owned by Concurrency Labs, in the N. Virginia region. The dataset resulted in 12,696 files, for a total of 234 GB. File size is expected, given they are stored in compressed format.
Both Starburst Galaxy and Snowflake accessed data through a catalog managed by the AWS Glue service.
Iceberg tables were created using DDL statements following this pattern: CREATE TABLE glue.<table-name> (<column-definition) WITH (format=PARQUET, location=<s3-location>, type=ICEBERG;)
Data was inserted into S3 using as a source data stored in Parquet format available in the tpcds.sf1000 schema managed by Starburst Galaxy. This was done using INSERT SQL statements for each TPC-DS table, with the following pattern: INSERT INTO glue.<schema>.<table-name> SELECT <columns> FROM tpcds.sf1000.<table-name>;
Row count for each table was confirmed to comply with the TPC-DS standard (approximately 6.35 billion records in total, stored in 24 tables).
The 99 TPC-DS queries used in these tests correspond to the TPC-DS specification.

Compute Infrastructure

Compute infrastructure was launched using the Starburst Galaxy and Snowflake GUI. Both platforms were configured to launch compute resources on Amazon Web Services in the N. Virginia AWS region (us-east-1) and were granted cross-account permissions (IAM Roles) to access data stored in an S3 bucket owned by Concurrency Labs and an AWS Glue catalog in the same AWS account as the S3 bucket.

Starburst Galaxy

Size: Large (16 credits/hour).
Cloud provider: AWS - N. Virginia
Cluster type: Accelerated (Warp Speed)
Number of workers: 16
Query result caching: off.

Snowflake

Size: X-Large Warehouse (16 credits/hour).
Cloud provider: AWS - N. Virginia
Type: Standard
Number of clusters:
- 1 scenario with multi-cluster disabled (16 credits)
- 1 scenario with scaling enabled to provision between 1 and 3 clusters (16-48 credits). This warehouse had 3 clusters provisioned throughout the whole query execution, given the load induced into the system.
Query acceleration: off
Query result caching: off (ran statement ALTER SESSION SET USE_CACHED_RESULT=FALSE)

Both Starburst Galaxy and Snowflake deliver a well documented way to launch compute infrastructure. However, none of these platforms expose more details about the underlying compute capacity available in each deployment type (i.e. vCPUs, memory, etc.), which would allow users to have more context regarding other cloud-based solutions, such as AWS EC2 or AWS EMR.

Test Scenarios

All scenarios were executed with 5 concurrent users sequentially executing the 99 TPC-DS queries, iterating through the whole set of queries a number of 5 times per user (i.e. each user executes queries 1 through 99 and repeats the sequence 5 times). Below are the three platform scenarios tested:

Scenario	Description
TPC-DS 1 TB Iceberg - Starburst Galaxy Large	5 concurrent users executed 5 iterations of 99 TPC-DS queries sequentially, using Starburst Galaxy on a Large cluster of type Warp Speed.
TPC-DS 1 TB Iceberg - Snowflake X-Large - 1 cluster	Same as above, but using a Snowflake X-Large Standard Warehouse with multi-cluster disabled.
TPC-DS 1 TB Iceberg - Snowflake X-Large - 1-3 clusters	Same as above, but using a Snowflake X-Large Standard Warehouse with multi-cluster scaling enabled between 1 and 3 clusters.

Tests were executed using Apache JMeter, a load testing tool designed to trigger and measure performance of concurrent requests. Tests were triggered from an EC2 instance launched in the N. Virginia region and the results below were calculated using the elapsed time reported for each transaction by JMeter and recorded in the EC2 instance.

Test Results

TPC-DS 1 TB Iceberg

The table below shows the average, maximum and minimum execution time in seconds, for the 25 executions triggered for each query (5 concurrent users executing 5 sequential sets of TPC-DS queries).

	Starburst Galaxy - Large - WarpSpeed				Snowflake - X-Large Standard Warehouse - 1 cluster				Snowflake - X-Large Standard Warehouse - 3 clusters
	avg	max	min		avg	max	min		avg	max	min
q01	7	18	3	q01	12	47	7	q01	5	11	3
q02	3	7	2	q02	16	40	5	q02	4	8	3
q03	2	5	1	q03	11	64	4	q03	3	6	1
q04	29	47	12	q04	68	143	39	q04	42	51	32
q05	12	18	6	q05	25	63	5	q05	5	11	3
q06	4	7	3	q06	16	56	6	q06	4	6	3
q07	6	10	3	q07	13	69	3	q07	4	6	2
q08	5	7	4	q08	14	62	3	q08	3	4	2
q09	7	13	4	q09	38	60	22	q09	21	28	15
q10	5	9	4	q10	22	62	5	q10	4	8	3
q11	17	23	8	q11	42	75	21	q11	25	31	19
q12	3	5	2	q12	65	1257	3	q12	2	4	1
q13	10	21	6	q13	66	1258	5	q13	4	6	3
q14	35	50	14	q14	69	808	26	q14	22	29	17
q15	4	8	3	q15	47	795	4	q15	5	9	4
q16	9	15	5	q16	12	43	3	q16	4	8	2
q17	7	13	4	q17	19	48	6	q17	5	8	4
q18	9	16	5	q18	15	44	5	q18	5	8	4
q19	4	9	3	q19	38	532	6	q19	5	8	3
q20	3	6	2	q20	248	1305	3	q20	2	4	1
q21	3	4	2	q21	110	838	3	q21	2	4	1
q22	10	14	6	q22	122	842	3	q22	4	6	3
q23	46	69	20	q23	220	2660	27	q23	29	40	21
q24	21	29	10	q24	2651	4362	2132	q24	2223	4344	2123
q25	6	16	4	q25	195	1122	7	q25	4	7	3
q26	4	7	3	q26	143	1297	5	q26	3	5	2
q27	4	8	3	q27	129	782	3	q27	3	6	2
q28	8	13	3	q28	88	872	13	q28	14	18	12
q29	8	25	5	q29	145	1299	7	q29	5	7	4
q30	5	8	4	q30	373	2523	3	q30	5	10	3
q31	6	9	4	q31	100	818	5	q31	6	12	4
q32	2	3	2	q32	9	50	2	q32	1	3	1
q33	8	17	4	q33	78	1293	3	q33	4	10	2
q34	3	4	3	q34	11	34	1	q34	2	5	1
q35	6	12	4	q35	48	841	7	q35	8	12	5
q36	4	7	3	q36	64	1291	4	q36	2	4	1
q37	2	4	2	q37	9	30	3	q37	2	4	1
q38	8	17	4	q38	58	850	16	q38	14	21	10
q39	7	8	5	q39	8	19	3	q39	3	5	2
q40	3	4	2	q40	10	28	3	q40	3	5	2
q41	1	1	1	q41	10	60	1	q41	1	3	0
q42	2	4	1	q42	6	38	1	q42	1	4	1
q43	3	5	2	q43	10	39	2	q43	2	2	1
q44	3	5	2	q44	63	1224	7	q44	7	11	5
q45	5	8	4	q45	13	75	5	q45	4	7	3
q46	5	11	3	q46	44	760	5	q46	5	7	4
q47	32	48	15	q47	16	31	9	q47	8	14	7
q48	9	34	5	q48	11	20	4	q48	3	6	3
q49	6	10	4	q49	11	22	5	q49	6	12	3
q50	17	31	9	q50	11	18	6	q50	5	7	4
q51	7	14	3	q51	12	26	5	q51	5	7	4
q52	2	3	1	q52	7	39	1	q52	1	3	1
q53	3	7	2	q53	7	31	2	q53	2	4	1
q54	5	8	4	q54	9	27	3	q54	3	6	2
q55	2	2	1	q55	8	21	2	q55	1	3	1
q56	6	12	3	q56	9	24	3	q56	3	8	2
q57	19	30	9	q57	12	32	5	q57	5	9	3
q58	9	26	4	q58	18	118	3	q58	3	6	2
q59	6	14	2	q59	20	216	6	q59	6	9	4
q60	7	13	4	q60	21	163	5	q60	5	7	3
q61	6	12	4	q61	18	123	4	q61	5	7	4
q62	5	13	3	q62	37	491	2	q62	2	4	1
q63	3	7	2	q63	26	149	2	q63	2	4	1
q64	17	23	13	q64	102	569	22	q64	24	29	20
q65	11	16	5	q65	22	85	10	q65	8	11	6
q66	5	8	3	q66	27	169	2	q66	4	8	2
q67	69	92	36	q67	183	324	149	q67	158	176	141
q68	5	10	3	q68	26	154	6	q68	4	6	3
q69	5	7	4	q69	36	153	6	q69	4	8	3
q70	7	14	5	q70	23	55	4	q70	4	7	3
q71	4	8	3	q71	20	145	3	q71	2	5	1
q72	28	46	15	q72	49	184	19	q72	19	24	14
q73	3	5	3	q73	42	306	2	q73	2	3	1
q74	9	18	5	q74	45	169	20	q74	20	27	15
q75	16	29	9	q75	32	62	16	q75	16	23	11
q76	9	21	4	q76	37	297	6	q76	7	9	4
q77	6	9	4	q77	13	53	3	q77	4	6	2
q78	29	43	12	q78	121	151	93	q78	73	106	53
q79	5	11	4	q79	62	1241	4	q79	5	10	4
q80	7	10	5	q80	75	1306	10	q80	8	12	6
q81	6	14	4	q81	55	793	4	q81	4	6	3
q82	3	6	2	q82	52	838	2	q82	2	4	2
q83	6	19	4	q83	12	29	2	q83	3	7	1
q84	5	15	3	q84	20	109	3	q84	3	6	2
q85	10	20	7	q85	19	101	5	q85	6	11	4
q86	5	8	3	q86	9	22	1	q86	2	4	1
q87	7	15	3	q87	33	52	18	q87	22	30	17
q88	8	13	5	q88	78	1286	14	q88	16	21	13
q89	3	6	2	q89	10	28	2	q89	2	5	1
q90	3	6	2	q90	41	815	1	q90	3	8	1
q91	5	16	4	q91	63	1293	1	q91	2	4	1
q92	2	9	2	q92	8	19	2	q92	3	5	2
q93	22	39	11	q93	44	835	5	q93	6	9	5
q94	6	15	4	q94	16	75	2	q94	3	7	2
q95	13	31	6	q95	12	30	3	q95	4	7	2
q96	3	8	2	q96	7	11	2	q96	2	5	2
q97	10	16	6	q97	18	48	7	q97	9	14	6
q98	3	10	2	q98	17	77	2	q98	2	3	1
q99	7	14	4	q99	10	53	2	q99	3	6	2
Avg (sec)	9	16	5	Avg (sec)	73	448	30	Avg (sec)	31	56	28
Total (sec)	861	1,549	482	Total (sec)	7,261	44,358	2,967	Total (sec)	3,043	5,555	2,741

	Average execution time (sec)			Comparisons
	Starburst Galaxy	Snowflake - 1 cluster	Snowflake - 3 clusters	Galaxy vs Snowflake 1 cluster	Galaxy vs Snowflake 3 clusters	Snowflake 3 clusters vs. 1 cluster
q01	7	12	5	0.55	1.33	0.42
q02	3	16	4	0.22	0.81	0.27
q03	2	11	3	0.22	0.92	0.24
q04	29	68	42	0.43	0.69	0.61
q05	12	25	5	0.47	2.56	0.18
q06	4	16	4	0.24	0.95	0.25
q07	6	13	4	0.45	1.66	0.27
q08	5	14	3	0.36	1.93	0.19
q09	7	38	21	0.18	0.32	0.55
q10	5	22	4	0.25	1.32	0.19
q11	17	42	25	0.4	0.66	0.6
q12	3	65	2	0.04	1.45	0.03
q13	10	66	4	0.15	2.26	0.07
q14	35	69	22	0.5	1.6	0.31
q15	4	47	5	0.08	0.8	0.11
q16	9	12	4	0.74	2.49	0.3
q17	7	19	5	0.36	1.26	0.29
q18	9	15	5	0.59	1.83	0.32
q19	4	38	5	0.11	0.98	0.12
q20	3	248	2	0.01	1.42	0.01
q21	3	110	2	0.02	1.8	0.01
q22	10	122	4	0.08	2.4	0.03
q23	46	220	29	0.21	1.62	0.13
q24	N/A	N/A	N/A	N/A	N/A	N/A
q25	6	195	4	0.03	1.3	0.02
q26	4	143	3	0.03	1.52	0.02
q27	4	129	3	0.03	1.54	0.02
q28	8	88	14	0.09	0.58	0.16
q29	8	145	5	0.06	1.75	0.03
q30	5	373	5	0.01	1.17	0.01
q31	6	100	6	0.06	1.05	0.06
q32	2	9	1	0.24	1.62	0.15
q33	8	78	4	0.1	1.83	0.05
q34	3	11	2	0.28	1.48	0.19
q35	6	48	8	0.12	0.77	0.16
q36	4	64	2	0.06	1.9	0.03
q37	2	9	2	0.24	1.21	0.2
q38	8	58	14	0.13	0.56	0.24
q39	7	8	3	0.86	2.1	0.41
q40	3	10	3	0.31	1.11	0.28
q41	1	10	1	0.09	1.29	0.07
q42	2	6	1	0.27	1.7	0.16
q43	3	10	2	0.28	1.87	0.15
q44	3	63	7	0.05	0.41	0.11
q45	5	13	4	0.39	1.29	0.3
q46	5	44	5	0.11	1.06	0.11
q47	32	16	8	1.99	3.88	0.51
q48	9	11	3	0.81	2.56	0.31
q49	6	11	6	0.55	1.05	0.52
q50	17	11	5	1.63	3.87	0.42
q51	7	12	5	0.55	1.35	0.41
q52	2	7	1	0.26	1.89	0.14
q53	3	7	2	0.37	1.63	0.23
q54	5	9	3	0.59	1.58	0.38
q55	2	8	1	0.2	1.6	0.13
q56	6	9	3	0.65	1.85	0.35
q57	19	12	5	1.56	4.04	0.39
q58	9	18	3	0.5	2.62	0.19
q59	6	20	6	0.3	1.07	0.28
q60	7	21	5	0.32	1.47	0.22
q61	6	18	5	0.33	1.24	0.26
q62	5	37	2	0.14	2.43	0.06
q63	3	26	2	0.11	1.75	0.06
q64	17	102	24	0.17	0.73	0.23
q65	11	22	8	0.49	1.3	0.38
q66	5	27	4	0.19	1.19	0.16
q67	69	183	158	0.38	0.44	0.86
q68	5	26	4	0.17	1.07	0.16
q69	5	36	4	0.14	1.3	0.11
q70	7	23	4	0.31	1.89	0.16
q71	4	20	2	0.18	1.59	0.11
q72	28	49	19	0.57	1.51	0.38
q73	3	42	2	0.08	2	0.04
q74	9	45	20	0.19	0.43	0.45
q75	16	32	16	0.5	1.03	0.49
q76	9	37	7	0.23	1.3	0.18
q77	6	13	4	0.46	1.61	0.28
q78	29	121	73	0.24	0.4	0.61
q79	5	62	5	0.08	1	0.08
q80	7	75	8	0.1	0.94	0.1
q81	6	55	4	0.1	1.28	0.08
q82	3	52	2	0.06	1.32	0.04
q83	6	12	3	0.46	2.07	0.22
q84	5	20	3	0.22	1.5	0.15
q85	10	19	6	0.51	1.67	0.3
q86	5	9	2	0.52	2.76	0.19
q87	7	33	22	0.2	0.3	0.65
q88	8	78	16	0.11	0.51	0.21
q89	3	10	2	0.31	1.63	0.19
q90	3	41	3	0.07	1.07	0.07
q91	5	63	2	0.07	2.09	0.04
q92	2	8	3	0.29	0.81	0.35
q93	22	44	6	0.51	3.66	0.14
q94	6	16	3	0.37	2.19	0.17
q95	13	12	4	1.12	3.5	0.32
q96	3	7	2	0.42	1.22	0.34
q97	10	18	9	0.57	1.11	0.51
q98	3	17	2	0.16	1.56	0.11
q99	7	10	3	0.69	2.4	0.29
Avg	9	47	8	0.18	1.02	0.18
Median	6	22	4	0.24	1.46	0.19
Max	69	373	158	1.99	4.04	0.86
Min	1	6	1	0.01	0.3	0.01
Total (avg)	840	4,610	820

q24 took significantly longer to execute in Snowflake scenarios compared to Galaxy (>100x on average). Therefore this query was removed from the comparison tables, in order to avoid data distortion and to make the total comparisons more accurate. Even though this would require more troubleshooting, data from similar tests using Snowflake interacting with other catalogs show better performance for this query, therefore this could be related to the Snowflake integration with AWS Glue as a data catalog.

The table below shows the average query execution time relative to each scenario, as a percentage (excluding q24):

		relative to:
	Starburst Galaxy Large - WarpSpeed	Snowflake X-Large Standard Warehouse 1 cluster	Snowflake X-Large Standard Warehouse 3 clusters
Starburst Galaxy Large - WarpSpeed	N/A	18%	102%
Snowflake X-Large Standard Warehouse 1 cluster	555%	N/A	562%
Snowflake X-Large Standard Warehouse 3 clusters	98%	18%	N/A

Observations:

q24 took > 100x to complete in Snowflake scenarios compared to Starburst Galaxy.
All queries executed successfully for both Starburst Galaxy and Snowflake.
Starburst Galaxy was on average 5.55x faster compared to Snowflake X-Large running 1 cluster (excluding q24 from this comparison).
On average, Starburst Galaxy performance was virtually equal compared to Snowflake X-Large running 3 clusters (excluding q24 from this comparison).

Cost Analysis

The following table compares the cost of running a Large Starburst Galaxy cluster (16 credits/hour) and Snowflake Warehouses (1 and 3 clusters) in the AWS N. Virginia region. The cost per credit below is based on the publicly available documentation for both platforms.

Starburst Galaxy Large

Component	Price Dimension	Usage	Hourly Cost	Monthly Cost
Data store	S3 Standard Storage	234 GB (Iceberg format)	$0.01	$5
Starburst Galaxy Large Cluster	Credits per hour ($2.80)	Large cluster: 16 credits per hour	$44.80	$32,256
Data Transfer	S3 to EC2 - Intra-Regional Data Transfer	N/A	$0.00	$0.00
		Total:	$44.81	$32,261

Snowflake X-Large Warehouse - 1 cluster

Component	Price Dimension	Usage	Hourly Cost	Monthly Cost
Data store	S3 Standard Storage	234 GB (Iceberg format)	$0.01	$5
Snowflake X-Large Warehouse (1 cluster)	Credits per hour ($3.00)	X-Large warehouse (1 cluster): 16 credits per hour	$48.00	$34,560
Data Transfer	S3 to EC2 - Intra-Regional Data Transfer	N/A	$0.00	$0.00
		Total:	$48.01	$34,565

Snowflake X-Large Warehouse - 3 clusters

Component	Price Dimension	Usage	Hourly Cost	Monthly Cost
Data store	S3 Standard Storage	234 GB (Iceberg format)	$0.01	$5
Snowflake X-Large Warehouse (3 clusters)	Credits per hour ($3.00)	X-Large warehouse (3 clusters): 48 credits per hour	$144.00	$103,680
Data Transfer	S3 to EC2 - Intra-Regional Data Transfer	N/A	$0.00	$0.00
		Total:	$144.01	$103,685

It is highly recommended to configure the Auto Suspend feature - for both Galaxy and Snowflake, within a range of 3-5 minutes in order to avoid the possibility of accidentally leaving compute resources running when not being utilized. For AWS deployments, it’s also highly recommended to deploy compute resources and storage (in this case, S3) in the same AWS region, in order to avoid Intra-Regional Data Transfer cost and reduce latency as much as possible.

Considering all required components (storage, compute, etc.), owning an always-on Starburst Galaxy Large cluster would cost approximately $32.3K per month, while an equivalent Snowflake X-Large warehouse (1 cluster, 16 credits) would cost $34.6K. The Snowflake warehouse that had equivalent performance to Starburst Galaxy (Large Warehouse with 3 clusters - 48 credits) would cost approximately $104K per month, or 3.2x compared to Starburst Galaxy.

Conclusions

Both platforms are relatively simple to launch using their respective GUIs. When using data stored in AWS, it is required to configure cross-account IAM Roles in order to grant these platforms access to the data that will be analyzed.
Neither of these two platforms exposes more details about the underlying compute capacity available in each deployment type (i.e. vCPUs, memory, etc.). This information would be useful, since it would allow application owners to have more context regarding other cloud-based solutions, such as AWS Redshift, EC2 or EMR. The main comparison that can be made against other cloud-based platforms is based on cost relative to performance.
All TPC-DS queries executed successfully for Starburst Galaxy and Snowflake.
- However, q24 took approximately >100x to complete in Snowflake scenarios compared to Starburst Galaxy, therefore it was removed from comparison calculations.
Excluding q24, Starburst Galaxy Large (16 workers) was on average 5.55x faster compared to Snowflake X-Large running 1 cluster (16 credits).
Excluding q24, Starburst Galaxy Large (16 workers) had on average equal performance compared to Snowflake X-Large running 3 clusters (48 credits).
Considering all required components (storage, compute, etc.), running an always-on Starburst Galaxy Large cluster would cost approximately $32.3K per month, while an equivalent Snowflake X-Large warehouse (1 cluster, 16 credits) would cost $34.6K.
The Snowflake warehouse that had equivalent performance to Starburst Galaxy (Snowflake Large Warehouse with 3 clusters - 48 credits) would cost $104K per month, or approximately 3.2x compared to Starburst Galaxy.

Do you need help evaluating the right Data Analytics platform or optimizing your Big Data workloads in the cloud?

I will help you find the right Data Analytics platform for your business needs and optimize your Big Data workloads in the cloud to make sure they deliver the right balance between performance and cost for your business. Click on the button below to schedule a free consultation or use the contact form.