在线扩容
对于海量存储的分布式数据库,随着数据量的增大,已有集群容量与算力不够用是不可避免的问题,所以扩容操作不可或缺。MatrixDB可以做到在线扩容,包括:
- 不停机增加新节点
- 表数据重分布并行化
- 未重分布表与已重分布表可以进行连接操作
在线扩容分两个步骤:
- 增加新节点
- 数据重分布
1. 增加新节点
该环节为集群增加新的segment节点,并启动使用,但原有表数据还存储在老的节点上。
增加新节点首先要提供部署方案配置文件,该文件可以手动编写,也可以使用gpexpand
工具自动生成,推荐使用自动生成方式。
1.1 生成扩容部署配置文件
执行gpexpand
:
[mxadmin@mdw ~]$ gpexpand
......
Please refer to the Admin Guide for more information.
Would you like to initiate a new System Expansion Yy|Nn (default=N):
输入y
回车:
> y
然后要求输入新增节点主机名,逗号分割:
Enter a comma separated list of new hosts you want
to add to your array. Do not include interface hostnames.
**Enter a blank line to only add segments to existing hosts**[]:
> sdw3,sdw4
确认新增主机名网络联通,并且与原节点已建立互信。并确保已安装
rsync
每个节点额外增加多少机器?这里输入的值是在原有节点部署segment数量的基础上增加多少segment。如果每台主机部署节点数与原主机相同,则直接输入0即可。
How many new primary segments per host do you want to add? (default=0):
> 0
Generating configuration file...
20211102:14:36:24:024562 gpexpand:mdw:mxadmin-[INFO]:-Generating input file...
Input configuration file was written to 'gpexpand_inputfile_20211102_143624'.
Please review the file and make sure that it is correct then re-run
with: gpexpand -i gpexpand_inputfile_20211102_143624
20211102:14:36:24:024562 gpexpand:mdw:mxadmin-[INFO]:-Exiting...
执行完毕会在当前目录生成gpexpand_inputfile_20211102_143624
配置文件:
[mxadmin@mdw ~]$ cat gpexpand_inputfile_20211102_143624
sdw3|sdw3|7002|/home/mxadmin/gpdemo/datadirs/dbfast1/demoDataDir3|5|3|p
sdw3|sdw3|7003|/home/mxadmin/gpdemo/datadirs/dbfast2/demoDataDir4|6|4|p
sdw3|sdw3|7004|/home/mxadmin/gpdemo/datadirs/dbfast3/demoDataDir5|7|5|p
sdw4|sdw4|7002|/home/mxadmin/gpdemo/datadirs/dbfast1/demoDataDir6|8|6|p
sdw4|sdw4|7003|/home/mxadmin/gpdemo/datadirs/dbfast2/demoDataDir7|9|7|p
sdw4|sdw4|7004|/home/mxadmin/gpdemo/datadirs/dbfast3/demoDataDir8|10|8|p
配置文件中包含了新主机信息,以及新segment端口号、数据目录、dbid、角色等信息。这些信息与gp_segment_configuration
表对应。该配置文件也可以按照该格式手动编写。
1.2 执行新增节点操作
准备好配置文件后,下面执行增加新节点操作:
[mxadmin@mdw ~]$ gpexpand -i gpexpand_inputfile_20211102_143624
......
20211102:14:54:03:025371 gpexpand:mdw:mxadmin-[INFO]:-Unlocking catalog
20211102:14:54:03:025371 gpexpand:mdw:mxadmin-[INFO]:-Unlocked catalog
20211102:14:54:03:025371 gpexpand:mdw:mxadmin-[INFO]:-Creating expansion schema
20211102:14:54:03:025371 gpexpand:mdw:mxadmin-[INFO]:-Populating gpexpand.status_detail with data from database postgres
20211102:14:54:03:025371 gpexpand:mdw:mxadmin-[INFO]:-Populating gpexpand.status_detail with data from database template1
20211102:14:54:03:025371 gpexpand:mdw:mxadmin-[INFO]:-Populating gpexpand.status_detail with data from database mxadmin
20211102:14:54:03:025371 gpexpand:mdw:mxadmin-[INFO]:-************************************************
20211102:14:54:03:025371 gpexpand:mdw:mxadmin-[INFO]:-Initialization of the system expansion complete.
20211102:14:54:03:025371 gpexpand:mdw:mxadmin-[INFO]:-To begin table expansion onto the new segments
20211102:14:54:03:025371 gpexpand:mdw:mxadmin-[INFO]:-rerun gpexpand
20211102:14:54:03:025371 gpexpand:mdw:mxadmin-[INFO]:-************************************************
20211102:14:54:03:025371 gpexpand:mdw:mxadmin-[INFO]:-Exiting...
执行成功后,查询gp_segment_configuration
表,即可看到新节点已加入。
2. 数据重分布
在线扩容的第二步操作,是将表数据重分布到新节点上,因为新增加的segment节点上并没有数据。
MatrixDB支持的3种分布方式:随机、哈希、复制,都要按照规则同步到新segment节点上。其中,复制表依然要全量存储在新增加的segment节点上;哈希表则要按照规则做重分布;随机表因为数据分布无规律,不会做数据迁移,只是新插入的数据会按照概率落到新的segment节点上。
第一阶段执行完毕后,会在postgres
数据库创建schema:gpexpand
,包含了扩容状态信息:
postgres=# \d
List of relations
Schema | Name | Type | Owner | Storage
----------+--------------------+-------+---------+---------
gpexpand | expansion_progress | view | mxadmin |
gpexpand | status | table | mxadmin | heap
gpexpand | status_detail | table | mxadmin | heap
(3 rows)
其中,status_detail
表记录了所有需要做重分布的表及重分布状态:
postgres=# select * from gpexpand.status_detail ;
dbname | fq_name | table_oid | root_partition_name | rank | external_writable | status | expansion_started | expansion_finished | source_bytes
---------+-------------+-----------+---------------------+------+-------------------+-------------+-------------------+--------------------+--------------
mxadmin | public.t2 | 16388 | | 2 | f | NOT STARTED | | | 8192
mxadmin | public.t1 | 16385 | | 2 | f | NOT STARTED | | | 16384
mxadmin | public.disk | 16391 | public.disk | 2 | f | NOT STARTED | | | 0
(3 rows)
直接运行gpexpand
即可执行表重分布:
[mxadmin@mdw ~]$ gpexpand
......
gpexpand:mdw:mxadmin-[INFO]:-Querying gpexpand schema for current expansion state
20211102:15:18:27:026291 gpexpand:mdw:mxadmin-[INFO]:-Expanding mxadmin.public.t2
20211102:15:18:27:026291 gpexpand:mdw:mxadmin-[INFO]:-Finished expanding mxadmin.public.t2
20211102:15:18:27:026291 gpexpand:mdw:mxadmin-[INFO]:-Expanding mxadmin.public.t1
20211102:15:18:27:026291 gpexpand:mdw:mxadmin-[INFO]:-Finished expanding mxadmin.public.t1
20211102:15:18:27:026291 gpexpand:mdw:mxadmin-[INFO]:-Expanding mxadmin.public.disk
20211102:15:18:27:026291 gpexpand:mdw:mxadmin-[INFO]:-Finished expanding mxadmin.public.disk
20211102:15:18:32:026291 gpexpand:mdw:mxadmin-[INFO]:-EXPANSION COMPLETED SUCCESSFULLY
20211102:15:18:32:026291 gpexpand:mdw:mxadmin-[INFO]:-Exiting...
表重分布完毕后再查询status_detail
表,会看到状态已经变成了COMPLETED
:
postgres=# select * from gpexpand.status_detail ;
dbname | fq_name | table_oid | root_partition_name | rank | external_writable | status | expansion_started | expansion_finished | source_bytes
---------+-------------+-----------+---------------------+------+-------------------+-----------+----------------------------+----------------------------+--------------
mxadmin | public.t2 | 16388 | | 2 | f | COMPLETED | 2021-11-02 15:18:27.326247 | 2021-11-02 15:18:27.408379 | 8192
mxadmin | public.t1 | 16385 | | 2 | f | COMPLETED | 2021-11-02 15:18:27.431481 | 2021-11-02 15:18:27.507591 | 16384
mxadmin | public.disk | 16391 | public.disk | 2 | f | COMPLETED | 2021-11-02 15:18:27.531727 | 2021-11-02 15:18:27.570559 | 0
(3 rows)
如果想增加重分布并发度,可以在执行gpexpand
命令时加-B
参数,默认值是16,上限是128,如:
[mxadmin@mdw ~]$ gpexpand -B 32
重分布完毕后,运行gpexpand -c
来清理扩容过程的中间表:
[mxadmin@mdw ~]$ gpexpand -c
20211102:15:24:41:026524 gpexpand:mdw:mxadmin-[INFO]:-local Greenplum Version: 'postgres (MatrixDB) 5.0.0-enterprise~alpha (Greenplum Database) 7.0.0 build dev'
20211102:15:24:41:026524 gpexpand:mdw:mxadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 12 (MatrixDB 5.0.0-enterprise~alpha) (Greenplum Database 7.0.0 build dev) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5), 64-bit compiled on Oct 25 2021 15:24:16'
20211102:15:24:41:026524 gpexpand:mdw:mxadmin-[INFO]:-Querying gpexpand schema for current expansion state
Do you want to dump the gpexpand.status_detail table to file? Yy|Nn (default=Y):
> n
20211102:15:24:46:026524 gpexpand:mdw:mxadmin-[INFO]:-Removing gpexpand schema
20211102:15:24:46:026524 gpexpand:mdw:mxadmin-[INFO]:-Cleanup Finished. exiting...
到此,扩容成功。
3. FAQ
- 扩容过程中新旧节点的catalog是一致的吗
- catalog始终是一致的,新节点上即使还没有数据,但是catalog也是存在的。
- 扩容过程中对于其他查询的影响
- 因为是在线扩容,期间不会中断正在运行的所有查询,只是在增加新节点的时候因为要锁catalog,所以在此期间不支持DDL,加完新节点立即释放catalog锁。
- 在数据重分布的时候因为加了最高级别的锁,会阻塞读操作。
gpexpand的详细使用方法请参考文档