Every compress_threshold rows (default: 1200) constitute a Range. Within a Range, the data for a specific column (containing compress_threshold rows) is called a Stripe. If a column's data within a Stripe is particularly large, the Stripe is split into multiple 1 MB chunks. During reads, the system does not necessarily fetch the entire compress_threshold volume at once.
The compress_threshold parameter also influences the compression ratio:
Compressing a larger batch of data simultaneously often yields better compression ratios. This is because repetitive patterns are more likely to be captured within a single batch. If data is split across multiple batches, these patterns may be fragmented, reducing compression efficiency.
compress_threshold on Read Performance| ID | compress_threshold=1200 | 3600 | 10000 | 50000 |
|---|---|---|---|---|
| 1 | 8.8032 | 7.7846 | 6.6471 | 6.6824 |
| 2 | 0.981 | 0.997 | 2.2284 | 2.2322 |
| 3 | 3.3512 | 3.3143 | 3.3184 | 3.3177 |
| 4 | 7.7059 | 7.7326 | 6.6034 | 5.5994 |
| 5 | 5.5732 | 5.5042 | 5.5939 | 6.633 |
| 6 | 0.478 | 0.38 | 0.545 | 0.545 |
| 7 | 1.1908 | 1.1917 | 2.2172 | 2.2222 |
| 8 | 3.3602 | 3.3129 | 3.3018 | 3.3227 |
| 9 | 8.8429 | 7.74 | 7.7751 | 7.7795 |
| 10 | 3.3835 | 4.4239 | 5.5496 | 5.5499 |
| 11 | 0.845 | 0.743 | 2.2307 | 2.2299 |
| 12 | 1.1998 | 1.1887 | 2.2164 | 2.2138 |
| 13 | 6.6514 | 6.6351 | 6.6784 | 6.6424 |
| 14 | 0.837 | 0.776 | 1.1364 | 1.1275 |
| 15 | 1.1436 | 1.1359 | 1.183 | 1.1783 |
| 16 | 1.1768 | 1.1787 | 2.2673 | 2.2757 |
| 17 | 9.9419 | 8.8705 | 8.8922 | 8.8771 |
| 18 | 11.11337 | 11.11522 | 10.10384 | 10.10647 |
| 19 | 4.4792 | 3.3942 | 4.4918 | 4.467 |
| 20 | 2.2668 | 2.2389 | 3.3475 | 3.3346 |
| 21 | 11.11391 | 10.10633 | 11.1119 | 10.10742 |
| 22 | 3.3009 | 2.2864 | 3.372 | 3.3377 |
| Sum | 97.73958 | 92.05015 | 100.81134 | 99.78189 |
compress_threshold on Write Performance| Test Scenario | Metric | compress_threshold=1200 | 3600 | 10000 | 50000 |
|---|---|---|---|---|---|
| Write Performance | Partitioned Table (rows/s) | 852,334 | 919,031 | 1,055,371 | 1,057,424 |
| Write Performance | Non-Partitioned Table (rows/s) | 991,463 | 1,033,751 | 1,054,292 | 1,076,714 |
| Query Performance | time_bucket=1h (1-day range) |
357ms | 342ms | 333ms | 324ms |
| Query Performance | time_bucket=1d (1-month range) |
7,384ms | 6,312ms | 6,008ms | 6,025ms |
| Query Performance | time_bucket=30d (1-year range) |
94,278ms | 75,101ms | 68,184ms | 69,006ms |
| Query Performance | Point Lookup | 19.416 ms | 20.043 ms | 23.692 ms | 37.844 ms |
For index compression architecture, refer to the mars3btree section.
CREATE INDEX idx_name ON table_name
USING mars3btree (column_list)
WITH (
compresstype = 'lz4', -- Compression algorithm
compresslevel = 1, -- Compression level
compressctid = true, -- Compress CTID column?
encodechain = '', -- Encoding chain
minmax = true -- Enable min/max optimization
);
Parameters:
compresstype: Supported algorithms are lz4, zstd, and mxcustom.lz4. Offers fast compression/decompression with moderate ratios.zstd: Provides higher compression ratios but is slightly slower.mxcustom: Must be used in conjunction with encodechain.compresslevel: Supports values 1–9 (default: 1).compressctid: Determines whether the CTID column is compressed. Default is true. It is recommended to keep this enabled, as CTID columns typically achieve high compression ratios.Customer Case Study: In a specific customer scenario, enabling index compression yielded the following results:
As previously mentioned, compression (whether using ZSTD, LZ4, or encoding schemes like RLE, Dictionary, and Bitpacking) relies on a core principle: the more regular the data within a block or Stripe, the better the compression.
Example Setup:
-- Good Sort Key: device_id, ts
adw=# CREATE TABLE t_sort_good (
device_id int NOT NULL,
ts timestamptz NOT NULL,
site_id int NOT NULL,
status smallint NOT NULL,
v1 double precision NOT NULL,
v2 double precision NOT NULL,
attrs jsonb NOT NULL
)
USING MARS3
WITH (compresstype=zstd, compresslevel=3, mars3options='prefer_load_mode=bulk')
ORDER BY (device_id, ts);
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'device_id' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
-- Bad Sort Key: ts, device_id
adw=# CREATE TABLE t_sort_bad (
device_id int NOT NULL,
ts timestamptz NOT NULL,
site_id int NOT NULL,
status smallint NOT NULL,
v1 double precision NOT NULL,
v2 double precision NOT NULL,
attrs jsonb NOT NULL
)
USING MARS3
WITH (compresstype=zstd, compresslevel=3, mars3options='prefer_load_mode=bulk')
ORDER BY (ts, device_id);
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'device_id' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
Data Generation and Insertion:
DROP TABLE IF EXISTS t_src;
CREATE UNLOGGED TABLE t_src AS
WITH
dev AS (
SELECT d AS device_id,
(d % 200) AS site_id,
(1000 + (d % 500))::double precision AS base
FROM generate_series(1, 10000) AS d
),
pts AS (
SELECT device_id, site_id, base,
g AS seq,
(timestamp '2026-01-01' + (g || ' seconds')::interval) AS ts
FROM dev
CROSS JOIN generate_series(1, 1000) AS g
)
SELECT
device_id,
ts,
site_id,
-- status: Changes segment every 300 seconds, forming long runs
((seq / 300) % 4)::smallint AS status,
-- v1/v2: Device baseline + small fluctuations
(base + (seq % 10) * 0.1)::double precision AS v1,
(base * 0.1 + (seq % 20) * 0.01)::double precision AS v2,
-- attrs: Highly similar key patterns, values vary by device/time
jsonb_build_object(
'fw', '1.2.' || (device_id % 10),
'model', 'm' || (device_id % 50),
'tag', 'S' || site_id,
'k', seq % 5
) AS attrs
FROM pts;
INSERT INTO t_sort_good SELECT * FROM t_src;
INSERT INTO t_sort_bad SELECT * FROM t_src;
Verification:
To ensure data accuracy, execute VACUUM FULL followed by VACUUM multiple times after insertion.
-- Check t_sort_good stats
adw=# SELECT segid, level, total_nruns, visible_nruns, invisible_nruns, level_size
FROM matrixts_internal.mars3_level_stats('t_sort_good')
WHERE level_size <> '0 bytes';
segid | level | total_nruns | visible_nruns | invisible_nruns | level_size
-------+-------+-------------+---------------+-----------------+------------
0 | 1 | 1 | 1 | 0 | 8839 kB
1 | 1 | 1 | 1 | 0 | 8933 kB
3 | 1 | 1 | 1 | 0 | 8552 kB
2 | 1 | 1 | 1 | 0 | 8728 kB
(4 rows)
-- Check t_sort_bad stats
adw=# SELECT segid, level, total_nruns, visible_nruns, invisible_nruns, level_size
FROM matrixts_internal.mars3_level_stats('t_sort_bad')
WHERE level_size <> '0 bytes';
segid | level | total_nruns | visible_nruns | invisible_nruns | level_size
-------+-------+-------------+---------------+-----------------+------------
0 | 2 | 1 | 1 | 0 | 26 MB
1 | 2 | 1 | 1 | 0 | 26 MB
3 | 2 | 1 | 1 | 0 | 25 MB
2 | 2 | 1 | 1 | 0 | 25 MB
(4 rows)
-- Check table sizes
adw=# \dt+ t_sort_bad
List of relations
Schema | Name | Type | Owner | Storage | Size | Description
--------+------------+-------+---------+---------+--------+-------------
public | t_sort_bad | table | mxadmin | mars3 | 103 MB |
(1 row)
adw=# \dt+ t_sort_good
List of relations
Schema | Name | Type | Owner | Storage | Size | Description
--------+-------------+-------+---------+---------+-------+-------------
public | t_sort_good | table | mxadmin | mars3 | 35 MB |
(1 row)
-- Sample data from t_sort_good (Sorted by device_id, then ts)
adw=# SELECT * FROM t_sort_good LIMIT 10;
device_id | ts | site_id | status | v1 | v2 | attrs
-----------+------------------------+---------+--------+--------+--------------------+-----------------------------------------------------
3 | 2026-01-01 00:00:01+08 | 3 | 0 | 1003.1 | 100.31000000000002 | {"k": 1, "fw": "1.2.3", "tag": "S3", "model": "m3"}
3 | 2026-01-01 00:00:02+08 | 3 | 0 | 1003.2 | 100.32000000000001 | {"k": 2, "fw": "1.2.3", "tag": "S3", "model": "m3"}
3 | 2026-01-01 00:00:03+08 | 3 | 0 | 1003.3 | 100.33000000000001 | {"k": 3, "fw": "1.2.3", "tag": "S3", "model": "m3"}
3 | 2026-01-01 00:00:04+08 | 3 | 0 | 1003.4 | 100.34000000000002 | {"k": 4, "fw": "1.2.3", "tag": "S3", "model": "m3"}
3 | 2026-01-01 00:00:05+08 | 3 | 0 | 1003.5 | 100.35000000000001 | {"k": 0, "fw": "1.2.3", "tag": "S3", "model": "m3"}
3 | 2026-01-01 00:00:06+08 | 3 | 0 | 1003.6 | 100.36000000000001 | {"k": 1, "fw": "1.2.3", "tag": "S3", "model": "m3"}
3 | 2026-01-01 00:00:07+08 | 3 | 0 | 1003.7 | 100.37 | {"k": 2, "fw": "1.2.3", "tag": "S3", "model": "m3"}
3 | 2026-01-01 00:00:08+08 | 3 | 0 | 1003.8 | 100.38000000000001 | {"k": 3, "fw": "1.2.3", "tag": "S3", "model": "m3"}
3 | 2026-01-01 00:00:09+08 | 3 | 0 | 1003.9 | 100.39000000000001 | {"k": 4, "fw": "1.2.3", "tag": "S3", "model": "m3"}
3 | 2026-01-01 00:00:10+08 | 3 | 0 | 1003 | 100.4 | {"k": 0, "fw": "1.2.3", "tag": "S3", "model": "m3"}
(10 rows)
-- Sample data from t_sort_bad (Sorted by ts, then device_id - mixed devices)
adw=# SELECT * FROM t_sort_bad LIMIT 10;
device_id | ts | site_id | status | v1 | v2 | attrs
-----------+------------------------+---------+--------+--------+--------------------+-------------------------------------------------------
1 | 2026-01-01 00:00:01+08 | 1 | 0 | 1001.1 | 100.11000000000001 | {"k": 1, "fw": "1.2.1", "tag": "S1", "model": "m1"}
12 | 2026-01-01 00:00:01+08 | 12 | 0 | 1012.1 | 101.21000000000001 | {"k": 1, "fw": "1.2.2", "tag": "S12", "model": "m12"}
15 | 2026-01-01 00:00:01+08 | 15 | 0 | 1015.1 | 101.51 | {"k": 1, "fw": "1.2.5", "tag": "S15", "model": "m15"}
20 | 2026-01-01 00:00:01+08 | 20 | 0 | 1020.1 | 102.01 | {"k": 1, "fw": "1.2.0", "tag": "S20", "model": "m20"}
23 | 2026-01-01 00:00:01+08 | 23 | 0 | 1023.1 | 102.31000000000002 | {"k": 1, "fw": "1.2.3", "tag": "S23", "model": "m23"}
35 | 2026-01-01 00:00:01+08 | 35 | 0 | 1035.1 | 103.51 | {"k": 1, "fw": "1.2.5", "tag": "S35", "model": "m35"}
38 | 2026-01-01 00:00:01+08 | 38 | 0 | 1038.1 | 103.81000000000002 | {"k": 1, "fw": "1.2.8", "tag": "S38", "model": "m38"}
40 | 2026-01-01 00:00:01+08 | 40 | 0 | 1040.1 | 104.01 | {"k": 1, "fw": "1.2.0", "tag": "S40", "model": "m40"}
44 | 2026-01-01 00:00:01+08 | 44 | 0 | 1044.1 | 104.41000000000001 | {"k": 1, "fw": "1.2.4", "tag": "S44", "model": "m44"}
47 | 2026-01-01 00:00:01+08 | 47 | 0 | 1047.1 | 104.71000000000001 | {"k": 1, "fw": "1.2.7", "tag": "S47", "model": "m47"}
(10 rows)
Summary:
t_sort_good: Groups similar data together. Data written to disk in the same batch is more regular and repetitive, making it easier for the compressor to work efficiently.t_sort_bad: Mixes completely different data together. Data written in the same batch is chaotic and random, making compression difficult.Comparison Results (Same Data Volume, Different Sort Keys):
t_sort_good Total Size: 35 MBt_sort_bad Total Size: 103 MB (Approximately 3x larger)Run/Level Statistics: Although each segment contains only one Run in both cases:
t_sort_good Runs reside in Level 1, approximately 8.5–8.9 MB per segment.t_sort_bad Runs reside in Level 2, approximately 25–26 MB per segment.Analysis:
t_sort_good (Sort Key: device_id, ts): Similarity Clustering → Compression Friendly
Sorting by (device_id, ts) ensures that each Stripe/block contains "a continuous time series for a single device." This leads to:
device_id, site_id, and status exhibit long runs of identical values, ideal for RLE and Dictionary encoding.ts, v1, and v2 change slowly over time within a device, resulting in small deltas ideal for delta/bitpacking.attrs (JSONB) is highly consistent within a device, benefiting ZSTD/LZ4.t_sort_bad (Sort Key: ts, device_id): Device Mixing → Compression Unfriendly
Sorting by (ts, device_id) mixes a large number of devices at every timestamp. Within a single block:
device_id changes almost every row, increasing dictionary size and reducing RLE run lengths to nearly 1.site_id and status are disrupted, breaking run-length efficiency.attrs become chaotic, reducing repeated substrings.ts is sequential, the randomness of other columns negates most compression benefits.In conclusion, a poor sort key not only increases storage size but also makes data governance more difficult, requiring data to reach higher levels before stabilizing.
Return to previous section: Storage Engine Principles