Table Design and Data Distribution Best Practices

Table Design Best Practices

Based on the specific characteristics of different storage engines, you can flexibly create various table types to suit different scenario requirements. The following examples are provided for reference.

MARS3
HEAP
AORO

1. MARS3 Tables

MARS3 tables rely on the matrixts extension. Before creating a table using this storage engine, you must first create the extension within the target database.

Note!
The matrixts extension is at the database level. It only needs to be created once per database; do not attempt to recreate it.

=# CREATE EXTENSION matrixts;

When creating a table, use USING MARS3 to specify the storage engine and ORDER BY to define the sort key. A basic table creation example is as follows:

=# CREATE TABLE mars3(
    time timestamp with time zone,
    tag_id int,
    i4 int4,
    i8 int8
)
USING MARS3 ORDER BY (tag_id, time);

We further illustrate this with a typical IoT time-series scenario example:

=# CREATE TABLE vehicle_basic_data_mars3(
  daq_time timestamp ,
  vin varchar(32)  COLLATE "C" ,
  lng float ,
  lat float ,
  speed float ,
  license_template varchar(16) ,
  flag integer
)
USING MARS3
WITH (compresstype=zstd, compresslevel=3, compress_threshold=1200,
        mars3options='rowstore_size=64')
DISTRIBUTED BY (vin)
ORDER BY (vin, daq_time)
PARTITION BY RANGE (daq_time)
( START ('2022-07-01 00:00:00') INCLUSIVE
  END ('2022-08-01 00:00:00') EXCLUSIVE
  EVERY (INTERVAL '1 day')
,DEFAULT PARTITION OTHERS);

After successfully creating an MARS3 table, you may choose to create a mars3_brin index on the sort key. mars3_brin is a sparse index that has minimal impact on disk space and insertion performance. Combined with the ordered nature of the sort key, it achieves optimal query efficiency for the sort key columns.

You can also create mars3_brin indexes on non-sort key columns; however, without the benefit of data ordering, query performance may be slightly reduced. Add these indexes as needed.

CREATE INDEX idx_mars3 ON vehicle_basic_data_mars3 USING mars3_brin(vin, daq_time);

The following best practices are explained based on the parameters and statements in the example table above:

Note!
We encourage you to adopt our suggested design principles, but we advise against blindly copying them. Time-series scenarios vary significantly; therefore, analyzing each specific case remains essential.

Selecting the Sort Key

Table design must account for the selection of the sort key. The purpose of the sort key is to ensure that data within the same time dimension or with similar attributes is physically co-located, thereby reducing I/O seek operations and improving query efficiency. Consequently, the choice of sort key must align with the primary business query patterns.

If the requirement is a point lookup for a single device, the sort key should be the device identifier (vin) in the time-series context.
If the requirement involves detailed queries, aggregation, or multi-device queries within a specific time range for a single device, the sort key should be (vin, daq_time).

We recommend selecting the unique identifier field of the data subject. If a timestamp is available, append it before the identifier field to form the sort key. In the example table, the sort key is (vin, daq_time).

`COLLATE "C"`

This option should be applied only to the device encoding field; other fields do not require it. Using this option improves the sorting and query speed for text-type columns.

`USING MARS3`

This is the fixed syntax required to create an MARS3 table. Do not alter this content.

`WITH (...)` Parameters

Example: WITH (compresstype=zstd, compresslevel=3, compress_threshold=1200, mars3options='rowstore_size=64, nbuckets = 2')

compresstype=zstd, compresslevel=3: For this table, we recommend zstd compression with a compression level of 3. This example uses a general-purpose compression algorithm. For custom compression schemes, refer to Using Compression.
rowstore_size=64: This sets the L0 Run size to 64 MB. The compression threshold is set to 1200 rows.
nbuckets = 2: This sets the number of buckets to 2. Valid values for nbuckets range from 1 to 128. The default value is 1, which indicates no bucketing is performed. For detailed usage, refer to MARS3 Bucket Best Practices.

Note!
nbuckets applies to all distribution strategies. Even with random distribution, bucketing can be performed within segments.

Note!
While nbuckets can be modified after creation, changing this parameter triggers a full table rewrite. To prevent accidental data loss, the current code warns the user that a rewrite will occur. A specific GUC parameter must be enabled to actually perform the data rewrite.

`DISTRIBUTED BY (vin)`

Use this clause to select the distribution key. The distribution key determines how data is distributed across data nodes (Segments), significantly impacting performance. Primary selection criteria include:

Minimize Data Skew: Ensure data is evenly distributed to maximize parallel query execution.
Optimize Joins: When designing joins between multiple tables, align the join keys with the distribution keys during the initial design phase to avoid unnecessary data movement.

In the example, we use the device ID (vin) as the distribution key. This allows queries for data belonging to the same device to be executed locally, avoiding the performance overhead associated with redistributing data between nodes.

`PARTITION BY RANGE (daq_time)`

This clause specifies the partition key for a partitioned table.

We recommend using time as the partition key for two reasons:

It enables automatic data expiration when used with Automatic Partition Management.
It leverages partition pruning to filter out irrelevant time partitions.

Partition size should ideally be between one million and ten million rows. Too few rows per partition consumes excessive memory for metadata management, while too many rows reduce the effectiveness of partition pruning and complicate data expiration.

In the example, we use the data acquisition time (daq_time) as the partition key. Most queries involve filtering by this time. For instance, to query data for the last day, you would add the condition WHERE daq_time >= CURRENT_DATE - INTERVAL '1 day'. The database then quickly identifies the relevant sub-tables, locates the data, and executes the query efficiently.

`START ... END ... EVERY ... DEFAULT PARTITION OTHERS`

START ('2022-07-01 00:00:00') INCLUSIVE END ('2022-08-01 00:00:00') EXCLUSIVE: This defines the range for the initial partition, starting from July 1, 2022, at midnight (inclusive) up to August 1, 2022, at midnight (exclusive).
EVERY (INTERVAL '1 day'): Sub-partitions have a time span of one day. You may also use hour, month, or year intervals depending on the data volume scale.
- If your server receives 1–10 million records per day, a 1 day interval is optimal.
- If daily volume is in the hundreds of thousands, a 1 month interval is appropriate.
- If daily volume is under 10,000, a 1 year interval suffices.
DEFAULT PARTITION OTHERS: Defines the default partition. Any data falling outside the explicitly defined ranges will be stored here.

Achieving Optimal Query Performance with `VACUUM` / `VACUUM FULL`

Functionality: Executing VACUUM converts unordered row-store Runs into ordered column-store Runs. VACUUM FULL performs the actions of VACUUM and additionally merges multiple Runs into a single Run, achieving superior query performance. However, VACUUM FULL requires an exclusive lock on the target table, preventing concurrent access.
Timing:
- During continuous data ingestion into MARS3 tables, cleanup operations are generally not required immediately.
- Once data becomes stable, execute cleanup operations before running queries.
- After significant data changes across the database or a specific table, run cleanup operations immediately following the update.

Unique Mode for Batched Data Scenarios

If devices report data in batches at the same timestamp, MARS3 can merge duplicate records (same daq_time and vin).

To enable this, you must manually specify uniquemode=true during table creation, as the default is false.

With uniquemode=true: If device 'A01' sends three records at '2022-01-01 00:00:00', the system retains only the last record, overwriting the previous two.
With uniquemode=false (default): All three records are retained without modification.

Note!
If Unique Mode is enabled, the first field in the ORDER BY clause must be defined with a NOT NULL constraint. MARS3's Unique Mode currently does not support deletion.

Example table creation statement with Unique Mode:

=# CREATE TABLE vehicle_basic_data_mars3_um(
  daq_time timestamp ,
  vin varchar(32)  COLLATE "C" NOT NULL,
  lng float ,
  lat float ,
  speed float ,
  license_template varchar(16) ,
  flag integer
)
USING MARS3
WITH (compresstype=zstd, compresslevel=3, compress_threshold=1200, uniquemode=true,
        mars3options='rowstore_size=64')
DISTRIBUTED BY (vin)
ORDER BY (vin, daq_time)
PARTITION BY RANGE (daq_time)
( START ('2022-07-01 00:00:00') INCLUSIVE
  END ('2022-08-01 00:00:00') EXCLUSIVE
  EVERY (INTERVAL '1 day')
,DEFAULT PARTITION OTHERS);

MARS3 Bucket Best Practices

For more technical details, see MARS3 Bucket Technical Principles.

The primary benefit of Bucket mode is that it preserves data distribution characteristics during parallel scanning. By providing bucket-based parallel scanning, the executor can avoid unnecessary data Motion based on data characteristics.

Default Value: 1
Valid Values: 1 to 128
Recommended Values:

The maximum value should not exceed (Total CPU Cores per Node / Number of Segments per Node).
Ensure that each bucket is not too small (< 2 MB). Consider this in conjunction with the RowStore size.

Usage Examples:

Method 1 (Global Setting): Set globally using set mars3.default_storage_options='mars3options="nbuckets=8"';.
Method 2 (Table Creation): Use create table foo (c1 int) using mars3 with (mars3options='nbuckets=3');. This overrides the global setting from Method 1.
Method 3 (Alter Table): Use alter table foo set (mars3options='nbuckets=5');. Not Recommended, as this triggers a full table rewrite.

Usage Notes:

In YMatrix, table data distribution is determined by the distribution method. For example, in a HASH-distributed table, a hash value is calculated for the data, which is then mapped to a specific Segment location via modulo or consistent hashing before writing. MARS3 supports a secondary bucketing mechanism within the table. When nbuckets > 1, the system calculates hash_value % nbuckets to place data into the corresponding bucket.

The default parallel scanning behavior in Bucket mode is illustrated below:

bucket: 0 1 2 3 4      bucket: 0 1 2 3 4

Worker 0 reads: bucket {0, 3}
Worker 1 reads: bucket {1, 4}
Worker 2 reads: bucket {2}

To use the legacy parallel scanning mode on a Bucket table, disable the mx_enable_bucket_parallel_mode option.

Important Warnings:

It is recommended to specify nbuckets only for large tables that require parallel scanning.
The uniquemode and continuous view features do not support Bucket mode. Specifying nbuckets for these objects will have no effect.
For non-HASH distributed tables, Bucket mode has no practical effect; all data will be placed in bucket 0.
Sorting information is no longer provided if parallel scanning is disabled.

2. HEAP Tables

HEAP tables are the default storage engine in YMatrix. If you do not explicitly specify a storage engine during table creation, the system creates a HEAP table by default.

=# CREATE TABLE disk_heap(
    time timestamp with time zone,
    tag_id int,
    read float,
    write float
)
DISTRIBUTED BY (tag_id);

3. AORO Tables

AORO tables are Append-Only Row-oriented tables. AORO supports table-level compression but does not support column-level compression.

=# CREATE TABLE disk_aoro(
    time timestamp with time zone,
    tag_id int,
    read float,
    write float
)
WITH(appendonly=true, orientation=row, compresstype=zlib, compresslevel=4)
DISTRIBUTED BY (tag_id);

Best Practices: Choosing Between HEAP and AO

Best practices for choosing between HEAP and AO tables are as follows:

Use the HEAP storage engine for tables and partitions that undergo frequent updates, deletes, or single-row insertions.
Use the HEAP storage engine for tables and partitions subject to concurrent updates, deletes, and insertions.
Use the AO storage engine for tables and partitions that are loaded initially and rarely updated thereafter, with subsequent inserts performed only in batch operations. Never perform single-row updates, deletes, or inserts on AO tables. Concurrent batch inserts are supported, but concurrent batch updates or deletes are not allowed.
Space occupied by rows updated or deleted in AO tables is not reclaimed and reused as effectively as in HEAP tables. Therefore, the AO storage engine is unsuitable for frequently updated tables. Its design goal is to support large tables with one-time loading, infrequent updates, and frequent analytical query processing.

Data Distribution Best Practices

Hash Distribution (HASH)

Table Creation Statement

CREATE TABLE finance_voucher_main (
    subject_id INT NOT NULL PRIMARY KEY,
    company_id INT NOT NULL ,
    voucher_date DATE NOT NULL,
    amount DECIMAL(18,2) NOT NULL,
    currency_type VARCHAR(3) DEFAULT 'CNY',
    create_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    update_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
) 
USING MARS3
DISTRIBUTED BY (company_id,subject_id);

View Table Structure

postgres=# \d+ finance_voucher_main
                                                  Table "public.finance_voucher_main"
     Column     |            Type             | Collation | Nullable |         Default          | Storage  | Stats target | Description
---------------+-----------------------------+-----------+----------+--------------------------+----------+--------------+-------------
  subject_id    | integer                     |           | not null |                          | plain    |              |
  company_id    | integer                     |           | not null |                          | plain    |              |
  voucher_date  | date                        |           | not null |                          | plain    |              |
  amount        | numeric(18,2)               |           | not null |                          | main     |              |
  currency_type | character varying(3)        |           |          | 'CNY'::character varying | extended |              |
  create_time   | timestamp without time zone |           |          | CURRENT_TIMESTAMP        | plain    |              |
  update_time   | timestamp without time zone |           |          | CURRENT_TIMESTAMP        | plain    |              |
Indexes:
    "finance_voucher_main_pkey" PRIMARY KEY, mars3_btree (subject_id)
Distributed by: (company_id, subject_id)
Access method: mars3

Random Distribution (RANDOM)

Table Creation Statement

CREATE TABLE etl_finance_temp (
    subject_id INT NOT NULL PRIMARY KEY,
    company_id INT NOT NULL ,
    voucher_date DATE NOT NULL,
    amount DECIMAL(18,2) NOT NULL,
    currency_type VARCHAR(3) DEFAULT 'CNY',
    create_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    update_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
) 
USING MARS3
DISTRIBUTED RANDOMLY;

View Table Structure

postgres=# \d+ etl_finance_temp
                                                    Table "public.etl_finance_temp"
     Column     |            Type             | Collation | Nullable |         Default          | Storage  | Stats target | Description
---------------+-----------------------------+-----------+----------+--------------------------+----------+--------------+-------------
  subject_id    | integer                     |           | not null |                          | plain    |              |
  company_id    | integer                     |           | not null |                          | plain    |              |
  voucher_date  | date                        |           | not null |                          | plain    |              |
  amount        | numeric(18,2)               |           | not null |                          | main     |              |
  currency_type | character varying(3)        |           |          | 'CNY'::character varying | extended |              |
  create_time   | timestamp without time zone |           |          | CURRENT_TIMESTAMP        | plain    |              |
  update_time   | timestamp without time zone |           |          | CURRENT_TIMESTAMP        | plain    |              |
Indexes:
    "etl_finance_temp_pkey" PRIMARY KEY, mars3_btree (subject_id)
Distributed randomly
Access method: mars3

Replicated Distribution (REPLICATED)

Table Creation Statement

CREATE TABLE dict_subject (
    subject_id INT NOT NULL PRIMARY KEY,
    company_id INT NOT NULL ,
    voucher_date DATE NOT NULL,
    amount DECIMAL(18,2) NOT NULL,
    currency_type VARCHAR(3) DEFAULT 'CNY',
    create_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    update_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
) 
USING MARS3
DISTRIBUTED REPLICATED;

View Table Structure

postgres=# \d+ dict_subject
                                                      Table "public.dict_subject"
     Column     |            Type             | Collation | Nullable |         Default          | Storage  | Stats target | Description
---------------+-----------------------------+-----------+----------+--------------------------+----------+--------------+-------------
  subject_id    | integer                     |           | not null |                          | plain    |              |
  company_id    | integer                     |           | not null |                          | plain    |              |
  voucher_date  | date                        |           | not null |                          | plain    |              |
  amount        | numeric(18,2)               |           | not null |                          | main     |              |
  currency_type | character varying(3)        |           |          | 'CNY'::character varying | extended |              |
  create_time   | timestamp without time zone |           |          | CURRENT_TIMESTAMP        | plain    |              |
  update_time   | timestamp without time zone |           |          | CURRENT_TIMESTAMP        | plain    |              |
Indexes:
    "dict_subject_pkey" PRIMARY KEY, mars3_btree (subject_id)
Distributed Replicated
Access method: mars3

Summary: Table Design and Data Distribution Best Practices

YMatrix's storage engines and data distribution strategies provide flexible and efficient solutions for diverse business scenarios.

Users can make selections based on multiple dimensions such as "Business Type (OLAP/OLTP/Hybrid) + Data Characteristics (Volume/Update Frequency) + Query Requirements (Dimensions/Join Methods)":

Storage Engine Selection: Prioritize the MARS3 engine for hybrid scenarios; choose AO tables for pure analytical workloads; and select HEAP for transactional workloads.
Distribution Strategy: Use Hash distribution for core business tables, Replicated distribution for small lookup tables, and Random distribution for temporary tables.

Through scientific selection, you can maximize the performance of the YMatrix cluster and achieve efficient synergy between data storage and query processing.

← Previous

DDL Best Practice

Onboard: Executor

English Русский 简体中文

Table Design and Data Distribution Best Practices

Table Design Best Practices

1. MARS3 Tables

Selecting the Sort Key

COLLATE "C"

USING MARS3

WITH (...) Parameters

DISTRIBUTED BY (vin)

PARTITION BY RANGE (daq_time)

START ... END ... EVERY ... DEFAULT PARTITION OTHERS

Achieving Optimal Query Performance with VACUUM / VACUUM FULL

Unique Mode for Batched Data Scenarios

MARS3 Bucket Best Practices

2. HEAP Tables

3. AORO Tables

Best Practices: Choosing Between HEAP and AO

Data Distribution Best Practices

Hash Distribution (HASH)

Random Distribution (RANDOM)

Replicated Distribution (REPLICATED)

Summary: Table Design and Data Distribution Best Practices

English

Русский

简体中文

`COLLATE "C"`

`USING MARS3`

`WITH (...)` Parameters

`DISTRIBUTED BY (vin)`

`PARTITION BY RANGE (daq_time)`

`START ... END ... EVERY ... DEFAULT PARTITION OTHERS`

Achieving Optimal Query Performance with `VACUUM` / `VACUUM FULL`