压缩与性能影响

每 compress_threshold (默认为 1200) 行的数据,我们称为一个 Range;一个 Range 内的某一列数据 (包含 compress_threshold 行),称之为 stripe;这一列数据如果特别大,那么 stripe 会切成若干个 1MB 的 chunk,读的时候也不会一下读整个 compress_threshold 的数据出来。

compress_threshold 对于压缩率也有一定的影响:

同一批次压缩的数据多了,可能压缩效果更好,原理上重复的数据被压的更多了,要不然在两个批次里面就没压到一起。

compress_threshold 对于读取的性能影响

id compress_thresthold=1200 3600 10000 50000
1 8.8032 7.7846 6.6471 6.6824
2 0.981 0.997 2.2284 2.2322
3 3.3512 3.3143 3.3184 3.3177
4 7.7059 7.7326 6.6034 5.5994
5 5.5732 5.5042 5.5939 6.633
6 0.478 0.38 0.545 0.545
7 1.1908 1.1917 2.2172 2.2222
8 3.3602 3.3129 3.3018 3.3227
9 8.8429 7.74 7.7751 7.7795
10 3.3835 4.4239 5.5496 5.5499
11 0.845 0.743 2.2307 2.2299
12 1.1998 1.1887 2.2164 2.2138
13 6.6514 6.6351 6.6784 6.6424
14 0.837 0.776 1.1364 1.1275
15 1.1436 1.1359 1.183 1.1783
16 1.1768 1.1787 2.2673 2.2757
17 9.9419 8.8705 8.8922 8.8771
18 11.11337 11.11522 10.10384 10.10647
19 4.4792 3.3942 4.4918 4.467
20 2.2668 2.2389 3.3475 3.3346
21 11.11391 10.10633 11.1119 10.10742
22 3.3009 2.2864 3.372 3.3377
sum 97.73958 92.05015 100.81134 99.78189

compress_threshold 对于写入的性能影响

测试场景 compress_threshold 1200 3600 10000 50000
写入性能 分区表 (条/s) 852,334 919,031 1,055,371 1,057,424
写入性能 非分区表 (条/s) 991,463 1,033,751 1,054,292 1,076,714
查询性能 time_bucket=1h (1天范围) 357ms 342ms 333ms 324ms
查询性能 time_bucket=1d (1月范围) 7,384ms 6,312ms 6,008ms 6,025ms
查询性能 time_bucket=30d (1年范围) 94,278ms 75,101ms 68,184ms 69,006ms
查询性能 点查场景 19.416 ms 20.043 ms 23.692 ms 37.844 ms

索引压缩

索引压缩架构参照 mars3btree 章节

CREATE INDEX idx_name ON table_name
USING mars3btree (column_list)
WITH (
    compresstype = 'lz4',           -- 压缩算法
    compresslevel = 1,              -- 压缩级别
    compressctid = true,            -- 是否压缩CTID列
    encodechain = '',               -- 编码链
    minmax = true                   -- 启用min/max优化
);
  1. compresstype (压缩算法):支持 lz4、zstd 和 mxcustom。默认为 lz4,lz4:压缩/解压速度快,压缩率中等;zstd:压缩率高,速度稍慢;mxcustom:需配合 encodechain 使用
  2. compresslevel(压缩级别):支持 1 ~ 9,默认 1。对于查询密集型建议 1 ~ 3 (优先解压速度),存储敏感型建议 6 ~ 9 (优先压缩率)
  3. compressctid (CTID 列是否压缩):默认为 true,建议保持为 true,CTID 列压缩率通常很高

在某客户场景中,使用索引压缩后:

  • 对 TOB 集群影响较小,节点 CPU 与 MEM 变化不大,同时可节省 24% Ymatrix 分区存储空间;
  • 对 TOC 集群效果显著,在牺牲一部分 CPU、MEM 的情况下,可节省 63% Ymatrix 分区存储空间。
  • 50 辆车查询 1 天 GPS 数据,开启索引压缩的服务吞吐量是未开启索引压缩的 3.3 倍;
  • 50 辆车查询 3 天 GPS 数据,开启索引压缩的服务吞吐量是未开启索引压缩的 1.1 倍。

排序键对于压缩的影响

前面提到,压缩 (不管是 zstd / lz4 还是 RLE / dict / bitpack 这些编码) 都依赖一个核心事实:同一块/同一 stripe 内的数据越规律,压缩越好。

lz4 / zstd 依赖重复子串/重复模式。排序后,同一块内字段组合更相似,尤其是宽表。当相近实体/相近时间的数据聚集在同一 stripe 内时,块内值域收敛、重复与成段重复增强,字典规模下降,delta / bitpacking 的 bit 宽度降低,从而提升编码与通用压缩的效果。

adw=# CREATE TABLE t_sort_good (                                                                                                                                                                                                                            
  device_id   int NOT NULL,                                                                                                                                                                                                                                 
  ts          timestamptz NOT NULL,                                                                                                                                                                                                                         
  site_id     int NOT NULL,                                                                                                                                                                                                                                 
  status      smallint NOT NULL,                                                                                                                                                                                                                            
  v1          double precision NOT NULL,                                                                                                                                                                                                                    
  v2          double precision NOT NULL,                                                                                                                                                                                                                    
  attrs       jsonb NOT NULL                                                                                                                                                                                                                                
)                                                                                                                                                                                                                                                           
USING MARS3                                                                                                                                                                                                                                                 
WITH (compresstype=zstd,compresslevel=3,mars3options='prefer_load_mode=bulk')                                                                                                                                                                               
ORDER BY (device_id, ts);                                                                                                                                                                                                                                   
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'device_id' as the Greenplum Database data distribution key for this table.                                                                                                       
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.                                                                                                  
CREATE TABLE 

adw=# CREATE TABLE t_sort_bad (                                                                                                                                                                                                                             
  device_id   int NOT NULL,                                                                                                                                                                                                                                 
  ts          timestamptz NOT NULL,                                                                                                                                                                                                                         
  site_id     int NOT NULL,                                                                                                                                                                                                                                 
  status      smallint NOT NULL,                                                                                                                                                                                                                            
  v1          double precision NOT NULL,                                                                                                                                                                                                                    
  v2          double precision NOT NULL,                                                                                                                                                                                                                    
  attrs       jsonb NOT NULL                                                                                                                                                                                                                                
)                                                                                                                                                                                                                                                           
USING MARS3                                                                                                                                                                                                                                                 
WITH (compresstype=zstd,compresslevel=3,mars3options='prefer_load_mode=bulk')                                                                                                                                                                               
ORDER BY (ts, device_id);                                                                                                                                                                                                                                   
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'device_id' as the Greenplum Database data distribution key for this table.                                                                                                       
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.                                                                                                  
CREATE TABLE 

然后生成原始数据源,同时往两张表里插入数据

DROP TABLE IF EXISTS t_src;
CREATE UNLOGGED TABLE t_src AS
WITH
  dev AS (
    SELECT d AS device_id,
           (d % 200) AS site_id,
           (1000 + (d % 500))::double precision AS base
    FROM generate_series(1, 10000) AS d
  ),
  pts AS (
    SELECT device_id, site_id, base,
           g AS seq,
           (timestamp '2026-01-01' + (g || ' seconds')::interval) AS ts
    FROM dev
    CROSS JOIN generate_series(1, 1000) AS g
  )
SELECT
  device_id,
  ts,
  site_id,
  -- status:每 300 秒变一次段,形成长 run
  ((seq / 300) % 4)::smallint AS status,
  -- v1/v2:设备基线 + 小波动
  (base + (seq % 10) * 0.1)::double precision AS v1,
  (base * 0.1 + (seq % 20) * 0.01)::double precision AS v2,
  -- attrs:key 模式高度相似,但值随设备/时间变化
  jsonb_build_object(
    'fw',  '1.2.' || (device_id % 10),
    'model', 'm' || (device_id % 50),
    'tag',  'S' || site_id,
    'k',    seq % 5
  ) AS attrs
FROM pts;

INSERT INTO t_sort_good SELECT * FROM t_src;
INSERT INTO t_sort_bad  SELECT * FROM t_src;

为了确保数据的准确性,插入完成后执行多次 vacuum full + vacuum

adw=# select segid,level,total_nruns,visible_nruns,invisible_nruns,level_size from matrixts_internal.mars3_level_stats('t_sort_good') where level_size <> '0 bytes';                                                                                        
 segid | level | total_nruns | visible_nruns | invisible_nruns | level_size                                                                                                                                                                                 
-------+-------+-------------+---------------+-----------------+------------                                                                                                                                                                                
     0 |     1 |           1 |             1 |               0 | 8839 kB                                                                                                                                                                                    
     1 |     1 |           1 |             1 |               0 | 8933 kB                                                                                                                                                                                    
     3 |     1 |           1 |             1 |               0 | 8552 kB                                                                                                                                                                                    
     2 |     1 |           1 |             1 |               0 | 8728 kB                                                                                                                                                                                    
(4 rows)                                                                                                                                                                                                                                                    

adw=# select segid,level,total_nruns,visible_nruns,invisible_nruns,level_size from matrixts_internal.mars3_level_stats('t_sort_bad') where level_size <> '0 bytes';                                                                                         
 segid | level | total_nruns | visible_nruns | invisible_nruns | level_size                                                                                                                                                                                 
-------+-------+-------------+---------------+-----------------+------------                                                                                                                                                                                
     0 |     2 |           1 |             1 |               0 | 26 MB                                                                                                                                                                                      
     1 |     2 |           1 |             1 |               0 | 26 MB                                                                                                                                                                                      
     3 |     2 |           1 |             1 |               0 | 25 MB                                                                                                                                                                                      
     2 |     2 |           1 |             1 |               0 | 25 MB                                                                                                                                                                                      
(4 rows)                                                                                                                                                                                                                                                  

adw=# \dt+ t_sort_bad                                                                                                                                                                                                                                       
                           List of relations                                                                                                                                                                                                                
 Schema |    Name    | Type  |  Owner  | Storage |  Size  | Description                                                                                                                                                                                     
--------+------------+-------+---------+---------+--------+-------------                                                                                                                                                                                    
 public | t_sort_bad | table | mxadmin | mars3   | 103 MB |                                                                                                                                                                                                 
(1 row)                                                                                                                                                                                                                                                     

adw=# \dt+ t_sort_good                                                                                                                                                                                                                                      
                           List of relations                                                                                                                                                                                                                
 Schema |    Name     | Type  |  Owner  | Storage | Size  | Description                                                                                                                                                                                     
--------+-------------+-------+---------+---------+-------+-------------                                                                                                                                                                                    
 public | t_sort_good | table | mxadmin | mars3   | 35 MB |                                                                                                                                                                                                 
(1 row)   

adw=# select * from t_sort_good limit 10;                                                                                                                                                                                                                   
 device_id |           ts           | site_id | status |   v1   |         v2         |                        attrs                                                                                                                                         
-----------+------------------------+---------+--------+--------+--------------------+-----------------------------------------------------                                                                                                                 
         3 | 2026-01-01 00:00:01+08 |       3 |      0 | 1003.1 | 100.31000000000002 | {"k": 1, "fw": "1.2.3", "tag": "S3", "model": "m3"}                                                                                                                  
         3 | 2026-01-01 00:00:02+08 |       3 |      0 | 1003.2 | 100.32000000000001 | {"k": 2, "fw": "1.2.3", "tag": "S3", "model": "m3"}                                                                                                                  
         3 | 2026-01-01 00:00:03+08 |       3 |      0 | 1003.3 | 100.33000000000001 | {"k": 3, "fw": "1.2.3", "tag": "S3", "model": "m3"}                                                                                                                  
         3 | 2026-01-01 00:00:04+08 |       3 |      0 | 1003.4 | 100.34000000000002 | {"k": 4, "fw": "1.2.3", "tag": "S3", "model": "m3"}                                                                                                                  
         3 | 2026-01-01 00:00:05+08 |       3 |      0 | 1003.5 | 100.35000000000001 | {"k": 0, "fw": "1.2.3", "tag": "S3", "model": "m3"}                                                                                                                  
         3 | 2026-01-01 00:00:06+08 |       3 |      0 | 1003.6 | 100.36000000000001 | {"k": 1, "fw": "1.2.3", "tag": "S3", "model": "m3"}                                                                                                                  
         3 | 2026-01-01 00:00:07+08 |       3 |      0 | 1003.7 |             100.37 | {"k": 2, "fw": "1.2.3", "tag": "S3", "model": "m3"}                                                                                                                  
         3 | 2026-01-01 00:00:08+08 |       3 |      0 | 1003.8 | 100.38000000000001 | {"k": 3, "fw": "1.2.3", "tag": "S3", "model": "m3"}                                                                                                                  
         3 | 2026-01-01 00:00:09+08 |       3 |      0 | 1003.9 | 100.39000000000001 | {"k": 4, "fw": "1.2.3", "tag": "S3", "model": "m3"}                                                                                                                  
         3 | 2026-01-01 00:00:10+08 |       3 |      0 |   1003 |              100.4 | {"k": 0, "fw": "1.2.3", "tag": "S3", "model": "m3"}                                                                                                                  
(10 rows)                                                                                                                                                                                                                                                   

adw=# select * from t_sort_bad limit 10;                                                                                                                                                                                                                    
 device_id |           ts           | site_id | status |   v1   |         v2         |                         attrs                                                                                                                                        
-----------+------------------------+---------+--------+--------+--------------------+-------------------------------------------------------                                                                                                               
         1 | 2026-01-01 00:00:01+08 |       1 |      0 | 1001.1 | 100.11000000000001 | {"k": 1, "fw": "1.2.1", "tag": "S1", "model": "m1"}                                                                                                                  
        12 | 2026-01-01 00:00:01+08 |      12 |      0 | 1012.1 | 101.21000000000001 | {"k": 1, "fw": "1.2.2", "tag": "S12", "model": "m12"}                                                                                                                
        15 | 2026-01-01 00:00:01+08 |      15 |      0 | 1015.1 |             101.51 | {"k": 1, "fw": "1.2.5", "tag": "S15", "model": "m15"}                                                                                                                
        20 | 2026-01-01 00:00:01+08 |      20 |      0 | 1020.1 |             102.01 | {"k": 1, "fw": "1.2.0", "tag": "S20", "model": "m20"}                                                                                                                
        23 | 2026-01-01 00:00:01+08 |      23 |      0 | 1023.1 | 102.31000000000002 | {"k": 1, "fw": "1.2.3", "tag": "S23", "model": "m23"}                                                                                                                
        35 | 2026-01-01 00:00:01+08 |      35 |      0 | 1035.1 |             103.51 | {"k": 1, "fw": "1.2.5", "tag": "S35", "model": "m35"}                                                                                                                
        38 | 2026-01-01 00:00:01+08 |      38 |      0 | 1038.1 | 103.81000000000002 | {"k": 1, "fw": "1.2.8", "tag": "S38", "model": "m38"}                                                                                                                
        40 | 2026-01-01 00:00:01+08 |      40 |      0 | 1040.1 |             104.01 | {"k": 1, "fw": "1.2.0", "tag": "S40", "model": "m40"}                                                                                                                
        44 | 2026-01-01 00:00:01+08 |      44 |      0 | 1044.1 | 104.41000000000001 | {"k": 1, "fw": "1.2.4", "tag": "S44", "model": "m44"}                                                                                                                
        47 | 2026-01-01 00:00:01+08 |      47 |      0 | 1047.1 | 104.71000000000001 | {"k": 1, "fw": "1.2.7", "tag": "S47", "model": "m47"}                                                                                                                
(10 rows) 

简而言之

  • t_sort_good 把相似的数据放在一起,于是同一批落盘的数据更规律、更重复,压缩器更容易工作;
  • t_sort_bad 把完全不同的数据混在一起,于是每一批落盘的数据更杂、更随机,压缩器很难压缩。

同样的数据量,只是排序键不同:

  • t_sort_good 总大小 35 MB,
  • t_sort_bad 总大小 103 MB (约 3 倍) 从 run/level 统计看:每个 seg 都只有一个 run,但:
  • t_sort_good 的 run 在 level 1,每 seg 大约 8.5–8.9 MB
  • t_sort_bad 的 run 在 level 2,每 seg 大约 25–26 MB

对于 t_sort_good 来说,按 (device_id, ts) 排序 → 相似聚集 → 压缩友好,这样在同一个 stripe/块里,往往是“同一个设备的一段连续时间”。这会带来:

  • device_id/site_id/status 这些列会出现长段重复 (很适合 RLE/字典编码)
  • ts 和 v1/v2 这种随时间缓慢变化的列,块内增量很小 (很适合 delta/bitpacking)
  • attrs(jsonb) 的 key 模式在同设备内高度相似(对 zstd/lz4 也更友好)

因此,同样的行数,物理字节会明显更小。

对于 t_sort_bad 来说,按 (ts, device_id) 排序 → 设备混杂 → 压缩不友好,同一个时间点会混进大量设备,导致同一块里:

  • device_id 几乎每行都变化 (字典更大、RLE 段长度接近 1)
  • site_id/status 也会被打散 (run-length 被破坏)
  • attrs 的组合模式更杂 (重复片段减少)
  • 即使 ts 是递增的,但其它列的“随机性”把整体压缩收益吃掉了

所以压缩比显著变差,物理大小变成 3 倍非常正常。

换句话说:坏排序不仅更大,还更难治理,到了更高层才稳定下来。

返回上一章节:存储引擎原理