Introduction to Snowflake Cloud Data Warehouse

Many distinctive features make Snowflake, a cloud-native analytical data warehouse solution, incredibly strong and set it apart from other cloud-native data warehouse solutions provided by major providers. The multi-cluster shared data architecture that Snowflake features is based on allows it to provide a unified and seamless data experience. It is constructed using the AWS, Azure, and Google cloud platforms and combines the features of conventional databases with several fresh and innovative options.

Snowflake Cloud Data Warehouse

Snowflake SaaS is a service that comes with a plethora of built-in features. Some of the major Snowflake features are:

  • Cloud agnostic being supported by all three dominant cloud providers – Azure, AWS, and GCP.
  • It supports the auto-scaling of resources.
  • Auto-resume/suspend clusters.
  • Zero maintenance – users do not need to worry about automated updates, patches, and load balancing.
  • Secure and encrypted data – Supports RBAC, MFA, and SSO and is certified by HIPAA and SOC 2.

In this article, we will explore some of the unique Snowflake features that make it an excellent choice for cloud data warehousing.

Performance Management

Snowflakes delivers great performance without any manual performance optimization. This is an excellent offering because many efforts are required to fine-tune traditional data warehouses’ databases and objects. Snowflake has almost eliminated that by the following features:

Workload Distribution

In traditional data warehouses, ETL/ELT processes are usually conducted during non-business hours when BI reports or cubes are not refreshed to avoid concurrent workload.

In the case of Snowflake, concurrency is no longer an issue with its multi-cluster design. One of the primary advantages of this design is the ability to separate workloads to be processed against their computing clusters, referred to as a virtual warehouse. Refer below for its workload distribution architecture:

Snowflake Cloud Data Warehouse

Queries from one virtual warehouse do not affect those from another. Having separate virtual warehouses for users and apps allows ETL/ELT processing, BI, and reporting to execute without competing for resources.

Snowflake Micro-partitions

Optimizing tables with indexes, partitions, statistics, etc., requires substantial expertise and high effort and is also time-consuming. Snowflake almost makes this non-existent with its auto-optimization of the table with one of the features called Micro-partitioning in Snowflake. Below is an illustration:

The table comprises 24 rows distributed across four micro-partitions, with the rows distributed evenly across each micro-partition. The data is sorted and saved by column within each micro-partition, allowing Snowflake to conduct the following actions for table queries:

  • Remove any micro-partitions that are not required for the question.
  • Inside the remaining micro-partitions, prune by column.
Snowflake micro partition

Snowflake automatically divides tables by grouping rows into individual micro-partitions of 50–500 MB of data. Snowflake micro-partitions create more uniformly sized partitions that limit data skew and make it easy to prune large tables at an extremely granular level. In other words, queries can use micro-partition metadata to determine which partitions are relevant for a query so that only those are scanned. Snowflake goes one step further by limiting partition scanning to only the columns filtered in a query.

Clustering in Snowflake

The performance of queries suffers when a table’s data needs to be sorted. The Snowflake clustering key is the subset of columns that can be utilized to co-locate data and sort data that is contained in tables. For each micro-partition made during data loading, Snowflake collects clustering metadata. Then, to avoid needless micro-partition scanning, the metadata is used. If queries take longer than intended for very big tables, Snowflake clustering keys might be manually created.

Operational features

Snowflake offers a few features that reduce many development efforts needed for key operations that are often required. These Snowflake features come in handy, especially in cases of urgency. Below are some of the unique features in that regard:

Snowflake Zero Copy Clones

A handy and unique feature of Snowflake is Zero-copy cloning. In traditional data warehousing systems, cloning data is very tedious because it requires an entirely new, distinct environment and putting data into it to duplicate an existing database. It incurs a cost for more storage. It is impractical to use this for ad hoc analysis often, creating different environments or checking modifications. With Snowflake’s zero-copy cloning feature, you may virtually immediately duplicate any database or table without making a new copy. It accomplishes this by keeping track of changes made to the clone in its metadata store while using the duplicate data files in the backend.

Sample code for cloning a table:

create table Orders_Clone clone Orders;

Snowflake SWAP Table Command

Swapping tables is a capability that is similar to cloning. This is very helpful for moving development databases or tables into production. There are several ways to accomplish this, such as simply making a copy of the development table and renaming it to the production table. However, Snowflake has a much simpler method that involves swapping the table, which is very similar to cloning. In essence, all we are doing is switching the metadata.

Sample code for swapping a table:

alter table if exists TAB1 swap with TAB2

Snowflake UNDROP Command

Deleting the wrong table is a nightmare for many developers. A great feature in Snowflake is the UNDROP command that easily recovers from such errors. Otherwise, you must spend much time restoring your backup and data when this happens. With the UNDROP command, Snowflake can restore objects instantly if you remain in the recovery window.

Sample code for undropping a table-

undrop table TABLE_NAME;

Share and Collaborate

Snowflake offers a unique feature that allows data owners to share their data with partners or other consumers without creating new copies of their data. Data consumers only pay for data processing because no data movement is involved, and its memory is unused. It also removes the dependency on Email or FTP services.

Snowflake Data Ingestion Features

Snowflake does not just support data ingestion, but it provides powerful capabilities to build processes that are otherwise very complex quickly. Below are some of these features:

Continuous Data Ingestion by Snowflake Snowpipe

Snowflake snowpipe, the data pipeline service, becomes very useful to load new data in micro-batches when it becomes available in supported external storages. It removes the hurdle of loading data in larger batches. Instead, changes are reflected in a few minutes. Snowflake snowpipe works by event-based triggers from external stages to let it know that new files are available for import. These files are then copied to a queue from which they are loaded into Snowflake. Snowflake snowpipe is also callable through its REST endpoints, making it easier to use in custom applications.

Handling Semi-structured/Hierarchical Data

Using schema on a read data type called VARIANT, Snowflake’s data architecture enables storing structured and semi-structured data in the same location. Snowflake automatically parses the imported data, extracts the characteristics, and saves the data in columnar format. Consequently, pipelines for data extraction are no longer required.

VARIANT supports data import from JSON, Avro, ORC, or Parquet format; then Snowflake can build the hierarchy for you and store it in a VARIANT. It can also create the hierarchy manually.

Snowflake CDC Streams

Snowflake offers a stream object feature that captures DML (inserts, updates, and deletes) changes made to a table and then triggers an event based on that. This helps in querying and consuming changed data. This is also called Snowflake change data capture.

Conclusion

Snowflake data warehouse’s cloud-native architecture, automatic scalability, and comprehensive security measures make it a top solution for data warehousing. Its support for multiple data integration tools and languages, including SQL, Python, and R, enables easy data analysis. With Snowflake’s powerful analytical capabilities, organizations can derive valuable insights from their data and make informed decisions. Overall, Snowflakes’ unique features make it a standout choice for businesses of all options. Explore Snowflake today to unleash the full potential of your data and stay ahead of the competition.

Explore unique features of Snowflake Cloud Data Warehouse with our Snowflake Service Partners for more insights and expertise. Keep Learning!

Arka Majumdar
Associate Technical Architect, Analytics

Ready to get started?

Contact us Close