Skip to content

Parquet

A Parquet file is a special type of data file that's designed to store large amounts of information in a way that's really efficient and easy to work with. It's kind of like a digital filing cabinet for your data.

Parquet files are really handy for working with large, complex datasets. They're super efficient, fast, and flexible, which makes them a popular choice for a lot of big data and analytics applications.

Some of the key benefits of Parquet files include:

  • Smaller file size: The columnar storage and compression features make Parquet files much smaller than other data formats.
  • Faster performance: Data is organized making it really quick and easy to find and access the specific information you need.
  • Cross-platform: They can be used with all kinds of different tools and frameworks, so you can share your data anywhere.
  • Metadata support: They store information about the data structure and schema, which makes them really easy to work with.

Parquet parameters

Required parameters are in red and Optional parameters are in blue.


  • directory, the folder location to write JSON files to

  • compression, the method used to compress data
    default: GZIP ( allowed: GZIP, SNAPPY )
How do I use the Parquet syncer in commands?

cs_tools tools searchable bi-server --syncer parquet://directory=.

- or -

cs_tools tools searchable bi-server --syncer parquet://definition.toml

Definition TOML Example

definition.toml

[configuration]
directory = '...'
compression = 'SNAPPY'