Heap + Mars1存储方案实践
在MatrixDB中,使用分区技术来搭配两种表类型,按时间戳来做数据分区。热数据先写入Heap类型的热分区,在确定已接入数据不会发生修改并且不会上报更早的数据后,将Heap热分区转换为Mars冷分区。
分区时间跨度根据数据量不同而不同。比如:数据量大,分区粒度要小,可以按天或小时来分区;数据量小,分区粒度可以调大,按周或月来分区。
MatrixDB提供了分区管理的UDF,需要先建立相关扩展组件:
CREATE EXTENSION matrixts;
CREATE EXTENSION mars;
以记录磁盘读写量为例,做冷热分级存储的用法如下:
1 建立时序表
CREATE TABLE disk(
time timestamp with time zone,
tag_id int,
read float,
write float
)
Distributed by (tag_id)
Partition by range(time);
通过如上SQL语句,建立了一张按时间戳分区的磁盘读写时序表,分布键是tag_id。
2 建立分区模板
建立好时序表后,下面通过调用mars扩展提供的build_timeseries_table
函数建立分区模板:
SELECT mars.build_timeseries_table('disk','tagkey="tag_id", timekey="time", timebucket="1 day"');
在建立分区模板的过程中,需要提供tag键和时间戳的列名,还有时间单位。
分区模板表不会在DROP原表的时候自动删除,需要手动清理。如果要删除原表,正确操作如下:
SELECT mars.destroy_timeseries_table('disk'); DROP TABLE disk;
3 创建分区
建立好分区模板,下面调用mars扩展提供的add_partition
函数建立分区:
SELECT mars.add_partition('disk', '2021-04-01', '2021-05-01','1 day');
如上SQL建立了从2021-04-01到2021-05-01时间段的分区,时间单位为1天。
创建好分区,调用list_partition
函数查看分区信息:
ymatrix=# SELECT * from mars.list_partition('disk');
relname | storage
-----------------------------------------------+---------
heap_disk_20210401000000000_20210402000000000 | heap
heap_disk_20210402000000000_20210403000000000 | heap
heap_disk_20210403000000000_20210404000000000 | heap
heap_disk_20210404000000000_20210405000000000 | heap
heap_disk_20210405000000000_20210406000000000 | heap
heap_disk_20210406000000000_20210407000000000 | heap
heap_disk_20210407000000000_20210408000000000 | heap
heap_disk_20210408000000000_20210409000000000 | heap
heap_disk_20210409000000000_20210410000000000 | heap
heap_disk_20210410000000000_20210411000000000 | heap
heap_disk_20210411000000000_20210412000000000 | heap
heap_disk_20210412000000000_20210413000000000 | heap
heap_disk_20210413000000000_20210414000000000 | heap
heap_disk_20210414000000000_20210415000000000 | heap
heap_disk_20210415000000000_20210416000000000 | heap
heap_disk_20210416000000000_20210417000000000 | heap
heap_disk_20210417000000000_20210418000000000 | heap
heap_disk_20210418000000000_20210419000000000 | heap
heap_disk_20210419000000000_20210420000000000 | heap
heap_disk_20210420000000000_20210421000000000 | heap
heap_disk_20210421000000000_20210422000000000 | heap
heap_disk_20210422000000000_20210423000000000 | heap
heap_disk_20210423000000000_20210424000000000 | heap
heap_disk_20210424000000000_20210425000000000 | heap
heap_disk_20210425000000000_20210426000000000 | heap
heap_disk_20210426000000000_20210427000000000 | heap
heap_disk_20210427000000000_20210428000000000 | heap
heap_disk_20210428000000000_20210429000000000 | heap
heap_disk_20210429000000000_20210430000000000 | heap
heap_disk_20210430000000000_20210501000000000 | heap
(30 rows)
如上SQL列出来了disk表包含的所有分区子表,包括子表名和存储类型。
4 时序数据写入表
insert into disk values(now(), ...., ..., ...);
因为表是按时间分区的,所以直接向主表插入数据即可,数据会自动落到对应的分区上。
5 分区替换
当4月1日的时序数据采集完毕后,即不会再有4月1日的时序数据接入,要将4月1日的数据从热分区转换为冷分区。
Mars扩展提供了compress_partition函数,用来做分区转换:
ymatrix=# select mars.compress_partition('heap_disk_20210401000000000_20210402000000000');
compress_partition
-----------------------------------------------
mars_disk_20210401000000000_20210402000000000
(1 row)
如上SQL将4月1日对应的分区做了转换,转换后再调用list_partition
函数,可看到转换后的分区表名和存储类型:
ymatrix=# SELECT * from mars.list_partition('disk');
relname | storage
-----------------------------------------------+---------
heap_disk_20210402000000000_20210403000000000 | heap
heap_disk_20210403000000000_20210404000000000 | heap
heap_disk_20210404000000000_20210405000000000 | heap
heap_disk_20210405000000000_20210406000000000 | heap
heap_disk_20210406000000000_20210407000000000 | heap
heap_disk_20210407000000000_20210408000000000 | heap
heap_disk_20210408000000000_20210409000000000 | heap
heap_disk_20210409000000000_20210410000000000 | heap
heap_disk_20210410000000000_20210411000000000 | heap
heap_disk_20210411000000000_20210412000000000 | heap
heap_disk_20210412000000000_20210413000000000 | heap
heap_disk_20210413000000000_20210414000000000 | heap
heap_disk_20210414000000000_20210415000000000 | heap
heap_disk_20210415000000000_20210416000000000 | heap
heap_disk_20210416000000000_20210417000000000 | heap
heap_disk_20210417000000000_20210418000000000 | heap
heap_disk_20210418000000000_20210419000000000 | heap
heap_disk_20210419000000000_20210420000000000 | heap
heap_disk_20210420000000000_20210421000000000 | heap
heap_disk_20210421000000000_20210422000000000 | heap
heap_disk_20210422000000000_20210423000000000 | heap
heap_disk_20210423000000000_20210424000000000 | heap
heap_disk_20210424000000000_20210425000000000 | heap
heap_disk_20210425000000000_20210426000000000 | heap
heap_disk_20210426000000000_20210427000000000 | heap
heap_disk_20210427000000000_20210428000000000 | heap
heap_disk_20210428000000000_20210429000000000 | heap
heap_disk_20210429000000000_20210430000000000 | heap
heap_disk_20210430000000000_20210501000000000 | heap
mars_disk_20210401000000000_20210402000000000 | mars
(30 rows)
可以看到,转换后的分区名前缀从heap变成了mars,存储类型也发生了相应的变化。旧的分区会被自动删除。
compress_partition还有一个时间跨度的可选参数,可以将该分区与之前mars分区做归并。如下所示,系统会根据时间跨度和当前分区所属日期自动计算,将该分区转换后归并到之前的分区,或者创建一个新的分区。
SELECT mars.compress_partition('heap_disk_20210401000000000_20210402000000000', '10 days');