YMatrix SQL 查询常见问题

本文档介绍 YMatrix SQL 查询的常见问题。

数据查询相关介绍详情请见：

1 `ERROR: EvalPlanQual can not handle subPlan with Motion node`

问题分析

当参数 gp_enable_global_deadlock_detector 设置为 on 后，锁的模式可能下降为 RowExclusiveLock，当执行更新分布键时，UPDATE操作会被拆分为 DELETE + INSERT, 此时并发的更新分布键，DELETE 操作不能执行 EvalPlanQual，INSERT 操作也不会 blocked，可能会生成多余的数据，为了防止这种现象，抛出这类错误。

复现步骤

session 0: create table test_valplanqual (c1 int, c2 int) distributed by(c1);
CREATE
session 0: insert into test_valplanqual values(1,1);
INSERT 1
session 0: select * from test_valplanqual;
 c1 | c2
----+----
 1  | 1
(1 row)

session 1: begin;
BEGIN
session 2: begin;
BEGIN
session 1: update test_valplanqual set c1 = c1 + 1 where c1 = 1;
UPDATE 1
session 2: update test_valplanqual set c1 = c1 + 1 where c1 = 1;  <waiting ...>
session 1: end;
END
session 2<:  <... completed>
ERROR:  EvalPlanQual can not handle subPlan with Motion node  (seg1 127.0.1.1:7003 pid=34629)
session 2: end;
END
session 0: select * from test_valplanqual;
 c1 | c2
----+----
 2  | 1
(1 row)
session 0: drop table test_valplanqual;
DROP

解决方案

避免更新分布键。

2 `ERROR: Too many unresolved insertion xids, please do a vacuum`

问题分析

当前 mars2 存储引擎控制一个 BLOCK 上存在的 INSERT XID 数量，算法如下:

#define INSERTXIDSPERBLOCK      ((BLCKSZ - PAGE_RESERVED_SPACE) / sizeof(TransactionId))
#define PAGE_RESERVED_SPACE(MAXALIGN(sizeof(PageHeaderData)) + MAXALIGN(sizeof(SortHeapPageOpaqueData)))

大概约 (block_size - 24 + 24) / 4 个子事务，默认 block_size 为 8K 时，约为 2036，当子事务数超过该值时，会出现该错。

复现步骤

session 0: create table test_xids(id int) using mars2;
CREATE TABLE
session 0: create index ON test_xids using mars2_btree (id);
CREATE INDEX
session 0: 
 DO $$
DECLARE 
    i int; 
    _start timestamptz;
BEGIN
FOR i IN 1..3000 LOOP 
    _start = clock_timestamp();
    INSERT INTO test_xids VALUES (i);
    RAISE NOTICE 'value: % escape: % ', i, clock_timestamp() - _start; 
END LOOP;
END;
$$language plpgsql;

psql: NOTICE:  value: 1 escape: 00:00:00.019476
psql: NOTICE:  value: 2 escape: 00:00:00.002501
psql: NOTICE:  value: 3 escape: 00:00:00.00218
psql: NOTICE:  value: 4 escape: 00:00:00.002098
psql: NOTICE:  value: 5 escape: 00:00:00.002938
psql: NOTICE:  value: 6 escape: 00:00:00.001891
psql: NOTICE:  value: 7 escape: 00:00:00.001794
...
...
psql: NOTICE: value: 2069 escape: 00:00:10.001794
psql: NOTICE: value: 2070 escape: 00:00:10.001495
psql: NOTICE: value: 2071 escape: 00:00:10.001251
psql: NOTICE: ERROR:  Too many unresolved insertion xids, please do a vacuum (sortheap_external_sort.c:404)  (seg0 172.16.100.197:6000 pid=12992) (sortheap_external_sort.c:404)
CONTEXT:  SQL statement "insert into test_xids values(i)"
PL/pgSQL function inline_code_block line 7 at SQL statement

解决方案

避免在一个事务中多次执行 INSERT INTO tablename VALUES(...) 操作，最友好的做法是，每一个事务只执行一次 INSERT 操作，并且使用 INSERT INTO tablename VALUES(...),(...),(...) 的形式批量插入据。对于 mars2 表，官方更加推荐使用 mxgate工具更高效的写入数据。

3 `ERROR: modification of distribution columns in OnConflictUpdate is not supported`

问题分析

当参数 gp_enable_global_deadlock_detector 设置为 on 后，锁的模式可能下降为 RowExclusiveLock，当执行更新分布键时，INSERT...ON CONFLICT DO UPDATE SET... 语句在执行 UPDATE 操作会被拆分为 DELETE + INSERT, 此时更新分布键，抛出这类错误。

复现步骤

session 0: create table test_upsert(id int, name text, primary key(id)) distributed by(id);
CREATE TABLE
session 0: insert into test_upsert select 1, 'a';
INSERT 0 1
session 0: insert into test_upsert select 1, 'a' on conflict (id) do update set id =2;
psql: ERROR:  modification of distribution columns in OnConflictUpdate is not supported
session 0: insert into test_upsert select 1, 'a' on conflict (id) do update set name = 2;
INSERT 0 1
session 0: drop table test_upsert;
DROP

解决方案

执行 UPSERT 时，避免更新分布键。

4 图形化客户端收到 log：`server closed the connection unexpectedly`

图形化（UI）客户端访问远程数据库，长查询过程中或长时间 idle 后发查询，客户端有些时候会收到 log：

server closed the connection unexpectedly

问题分析

客户端存在查询超时 cancel 设置，或 idle 超时 cancel 连接设置。

解决方案

更改客户端超时设置，取消超时。

5 PARTITION 表简单 Filter 操作的 UNION ALL 查询比 IN 查询慢

问题分析

PARTITION 表的 IN 查询，分区裁剪后只有 1 个 DEFAULT 分区，但 UNION ALL 查询中每个子查询都裁剪到了 DEFAULT 分区，做了多次 DEFAULT 分区的扫描，性能影响明显。

解决方案

对于 PARTITION 表：尽量避免 DEFAULT 分区、尽量不用 UNION 而用 IN 子句。

6 插入 int 类型数据，单独查询跑很快，但放到 Plpgsql Function 里很慢

问题分析

Plpgsql Function 内的查询是通过 SPI 运行，SPI Plan 输出结果里是两表 Join，采用了 nestloop，语句 rows = 1，没有 ANALYSE。

解决方案

执行 ANALYZE。

7 PARTITION 分区裁剪更新操作，两个会话（Session）独立更新会导致互锁

问题分析

分布式死锁。

解决方案

打开分布式死锁检测。

gpconfig -c gp_enable_global_deadlock_detector -v on

8 自定义 type 使用及单表字段扩展

自定义type使用

创建type

CREATE TYPE public.fhpm AS
(
avgval double precision,
minval double precision,
maxval double precision,
minval_interval integer,
maxval_interval integer
);

创建表

CREATE TABLE datapool.test                
(                                         
 portindex integer,                    
 begintime timestamp without time zone,
 a_1 fhpm,                             
 s_2 integer                           
);

插入数据样例

insert into datapool.test values(1,'2022-01-01','(1,1,1,1,1)',1);          
insert into datapool.test values(2,'2022-01-01','(2,,2,,)',2);

type 类型属性数据查询样例

SELECT * FROM datapool.test;                             
portindex |      begintime      |     a_1     | s_2 
-----------+---------------------+-------------+-----
      1 | 2022-01-01 00:00:00 | (1,1,1,1,1) |   1 
      2 | 2022-01-01 00:00:00 | (2,,2,,)    |   2 
(2 rows)

SELECT (a_1).avgval FROM datapool.test;
avgval                              
--------                                
   1                                                           
   2              
(2 rows)

SELECT (a_1).minval FROM datapool.test;                   
minval                                 
--------                                
   1                                        
(2 rows)

单表字段扩展 psql: ERROR: tables can have at most 1600 columns

使用自定义 type 可以实现单表突破1600字段的限制。有兴趣的可以自己尝试一下,以下是实现样例。

CREATE TYPE public.fhpm1 AS
(
a0001 double precision,
a0002 double precision,
...
a1600 double precision);

CREATE TABLE datapool.test1                
(
portindex integer,
begintime timestamp without time zone,
a_1 fhpm1,
s_2 integer,
a_3 fhpm2
);

生成1600列 type 脚本

echo 'CREATE TYPE public.fhpm1 AS(' > test.sql
for i in {1..1599};do
echo 'a'$i' double precision,' >> test.sql
done
echo 'a1600 double precision);' >> test.sql

9 ERROR,XX000,Right sibling's left-link doesn't match : block 817 links to 45366 instead of expected 70930 in index ""tab_idx""

问题分析

该报错为索引损坏。在表频繁的更新删除，索引膨胀严重的时候也可能出现该报错。出现该报错意味着该索引已无法正常使用。

解决方案

重建索引。

drop index tab_idx；
create index tab_idx on table &tablename (column1,column2..);

10 更新删除时报错 `cannot delete from table "&tablename" because it does not have a replica identity and publishes deletes`

问题分析

报错的表被开启了逻辑复制。开启逻辑复制的表如果没有设置主键，只能执行insert操作，update和delete都会报以上错误。该复制逻辑出现在早期的postgresql中，但后续版本该功能一直存在，在 YMatrix 中建议不要开启该功能。

问题复现

test6=# create table test as select * from pg_tables distributed randomly;
SELECT 94
test6=# create publication rep_source for table test;
CREATE PUBLICATION
test6=# insert into test select * from test;
INSERT 0 94
test6=# update test set tablename='pg_class_old' where tablename='pg_class';
psql: ERROR:  cannot update table "test" because it does not have a replica identity and publishes updates
HINT:  To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.
test6=# delete from test where tablename='pg_class';
psql: ERROR:  cannot delete from table "test" because it does not have a replica identity and publishes deletes
HINT:  To enable deleting from the table, set REPLICA IDENTITY using ALTER TABLE.

解决方案

设置更新和删除时包含所有列的前映像值。

alter table &schema.tablename REPLICA IDENTITY FULL;

或者删除对应的复制发布任务。

select a.*,b.prrelid::regclass tablename from pg_publication a,pg_publication_rel b where a.oid=b.prpubid;
drop publication &任务名;

11 `ERROR: could not read block 0 in file "base/1588803/269422"`

问题分析

文件块损坏，导致对应的文件无法读取

问题复现

test6=# create table test as select * from pg_tables distributed randomly;
SELECT 94
test6=# create index idx_tabname on test1(tablename);
CREATE INDEX
test6=# select relname,relfilenode from pg_class where relname in('test1','idx_tabname');
   relname   | relfilenode 
-------------+-------------
 idx_tabname |      269422
 test1       |      269421
(2 rows)

根据relfilenode找到对应的文件，任意破坏一个(此处破坏的是索引文件)。

test6=# select * from test1 where tablename='a';
psql: ERROR:  could not read block 0 in file "base/1588803/269422": read only 8123 of 8192 bytes

解决方案

根据报错的 relfilenode 确认损坏的是表文件还是索引文件：
情形一：索引文件损坏。直接重建即可，使用 reindex 命令。
情形二：有 Mirror，数据表文件损坏，将相同 content 的 Mirror 文件拷贝替换 Primary 文件即可。
情形三：无 Mirror，数据表文件损坏，单独将正常实例的数据拷贝出来，重建表，异常实例的数据会丢失。

12 行转列示例

在使用 YMatrix 的时候会遇到行转列的 SQL 需求，下面针对这个问题进行演示。具体步骤如下：

行转列示例

创建测试表

create table test (name varchar(20),zbfm varchar(20),value integer);

插入测试数据

insert into test values('张三','年龄',60);
insert into test values('张三','身高',95);
insert into test values('张三','体重',31);
insert into test values('张三','鞋码',42);
insert into test values('李四','年龄',50);
insert into test values('李四','身高',83);
insert into test values('李四','体重',84);
insert into test values('李四','鞋码',43);
insert into test values('王五','年龄',97);
insert into test values('王五','身高',75);
insert into test values('王五','体重',66);
insert into test values('王五','鞋码',44);

查询测试数据

select * from test;
name | zbfm | value 
------+------+-------
张三 | 年龄 |    60
张三 | 身高 |    95
张三 | 体重 |    31
张三 | 鞋码 |    42
王五 | 年龄 |    97
王五 | 身高 |    75
王五 | 体重 |    66
王五 | 鞋码 |    44
李四 | 年龄 |    50
李四 | 身高 |    83
李四 | 体重 |    84
李四 | 鞋码 |    43
(12 rows)

将数据进行行转列展示

select name,
max(case when zbfm='年龄' then value else 0 end) as 年龄,
max(case when zbfm='身高' then value else 0 end) as 身高,
max(case when zbfm='体重' then value else 0 end) as 体重,
max(case when zbfm='鞋码' then value else 0 end) as 鞋码
from test 
group by name
order by 年龄 desc;
name | 年龄 | 身高 | 体重 | 鞋码 
------+------+------+------+------
王五 |   97 |   75 |   66 |   44
张三 |   60 |   95 |   31 |   42
李四 |   50 |   83 |   84 |   43
(3 rows)

13 列转行示例

在使用 YMatrix 的时候会遇到列转行的 SQL 需求，下面针对这个问题进行演示。具体步骤如下：

列转行示例

创建测试表

drop table if exists t_unpivot;
create table t_unpivot(
    currenttimestamp bigint  ,
    deviceid text  ,
    devicetemplatecode varchar  ,
    statisticstype text ,
    co2 float ,
    currentpowersum float ,
    currenttemperature float ,
    curtemp float ,
    duration float ,
    dust_level float 
);

插入测试数据

insert into t_unpivot values(1646376466611,'1.2.156.156.11.20.227161606742980522','138700003','statistics_avg',12.4,23.4,null,null,null,55.5);
insert into t_unpivot values(1646376466612,'1.2.156.156.11.20.607141619709364801','122400004','statistics_avg',null,34.3,56.3,null,null,null);
insert into t_unpivot values(1646376466613,'1.2.156.156.11.20.929741642180181067','178800001','statistics_avg',null,null,null,43.2,null,null);
insert into t_unpivot values(1646376466614,'1.2.156.156.11.20.327231588865913990','123200004','statistics_avg',null,null,null,null,23.2,null);
insert into t_unpivot values(1646376466615,'1.2.156.156.11.20.155831629756361011','154900008','statistics_avg',null,null,21.2,null,null,null);

查看测试数据

select * from t_unpivot;
currenttimestamp |               deviceid               | devicetemplatecode | statisticstype | co2  | currentpowersum | currenttemperature | curtemp | duration | dust_level 
------------------+--------------------------------------+--------------------+----------------+------+-----------------+--------------------+---------+----------+------------
 1646376466612 | 1.2.156.156.11.20.607141619709364801 | 122400004          | statistics_avg |      |            34.3 |               56.3 |         |          |           
 1646376466611 | 1.2.156.156.11.20.227161606742980522 | 138700003          | statistics_avg | 12.4 |            23.4 |                    |         |          |       55.5
 1646376466613 | 1.2.156.156.11.20.929741642180181067 | 178800001          | statistics_avg |      |                 |                    |    43.2 |          |           
 1646376466615 | 1.2.156.156.11.20.155831629756361011 | 154900008          | statistics_avg |      |                 |               21.2 |         |          |           
 1646376466614 | 1.2.156.156.11.20.327231588865913990 | 123200004          | statistics_avg |      |                 |                    |         |     23.2 |           
(5 rows)

将数据进行列转行展示

select currenttimestamp, 
    deviceid, 
    devicetemplatecode,
    statisticstype,
    (b.rec).key as key, 
    (b.rec).value as value 
from
(select currenttimestamp, 
     deviceid, 
     devicetemplatecode,
     statisticstype,
     jsonb_each_text(row_to_json(t.*)::jsonb-'currenttimestamp'-'deviceid'-'devicetemplatecode'-'statisticstype') as rec  
from t_unpivot t
) b
where (b.rec).value is not null;
currenttimestamp |               deviceid               | devicetemplatecode | statisticstype |        key         | value 
------------------+--------------------------------------+--------------------+----------------+--------------------+-------
 1646376466612 | 1.2.156.156.11.20.607141619709364801 | 122400004          | statistics_avg | currentpowersum    | 34.3
 1646376466612 | 1.2.156.156.11.20.607141619709364801 | 122400004          | statistics_avg | currenttemperature | 56.3
 1646376466611 | 1.2.156.156.11.20.227161606742980522 | 138700003          | statistics_avg | co2                | 12.4
 1646376466611 | 1.2.156.156.11.20.227161606742980522 | 138700003          | statistics_avg | dust_level         | 55.5
 1646376466611 | 1.2.156.156.11.20.227161606742980522 | 138700003          | statistics_avg | currentpowersum    | 23.4
 1646376466613 | 1.2.156.156.11.20.929741642180181067 | 178800001          | statistics_avg | curtemp            | 43.2
 1646376466615 | 1.2.156.156.11.20.155831629756361011 | 154900008          | statistics_avg | currenttemperature | 21.2
 1646376466614 | 1.2.156.156.11.20.327231588865913990 | 123200004          | statistics_avg | duration           | 23.2
(8 rows)

14 `pg_hba.conf` 文件内容配置的顺序导致的远程访问权限异常

问题分析

pg_hba.conf 是限制远程访问数据库权限配置文件，访问权限配置的顺序读取是从上往上，配置的顺序不对，会导致权限管控失败。

问题复现

创建测试用户。

test6=# create role test with login password 'test';
psql: NOTICE:  resource queue required -- using default resource queue "pg_default"
CREATE ROLE

登录测试。

[mxadmin@mxd2 mxseg-1]$ psql test6 -U test -h 192.168.8.12 -p 5432
Password for user test: 
psql (12)
Type "help" for help.
test6=>

通过修改 pg_hba.conf 限制 test 用户远程登录。限制配置在文件末尾。

[mxadmin@mxd2 mxseg-1]$ vi pg_hba.conf
#user access rules
host            all             all             0.0.0.0/0       md5
host all test 0.0.0.0/0  reject     //添加权限限制
[mxadmin@mxd2 mxseg-1]$ mxstop -u
//测试登录，登录正常，未成功限制远程
[mxadmin@mxd2 mxseg-1]$ psql test6 -U test -h 192.168.8.12 -p 5432
Password for user test: 
psql (12)
Type "help" for help.
test6=>

解决方案

检查当前用户在 pg_hba.conf 里面的限制，将权限限制行添加在合适的位置。

//调整后的 `pg_hab.conf` 文件
[mxadmin@mxd2 mxseg-1]$ vi pg_hba.conf
#user access rules
host all test 0.0.0.0/0  reject     //添加权限限制
host            all             all             0.0.0.0/0       md5

[mxadmin@mxd2 mxseg-1]$ mxstop -u
//测试登录，登录失败，成功限制远程
[mxadmin@mxd2 mxseg-1]$ psql test6 -U test -h 192.168.8.12 -p 5432
psql: error: could not connect to server: FATAL:  pg_hba.conf rejects connection for host "192.168.8.12", user "test", database "test6", SSL off

15 `idle in transaction timeout` 报错

问题分析

idle in transaction 连接超时，该超时是由参数控制，设置时需谨慎。

问题复现

检查当前参数设置。

test=# show idle_in_transaction_session_timeout ;
 idle_in_transaction_session_timeout 
-------------------------------------
 100s
(1 row)

为方便演示，在会话级别设置该参数为 10s。

test=# set idle_in_transaction_session_timeout ='10s';
SET
test=# show idle_in_transaction_session_timeout ;
 idle_in_transaction_session_timeout 
-------------------------------------
 10s
(1 row)

开启事务，测试该参数的影响。

test=# begin ;
BEGIN
test=# select count(*) from pg_tables;
 count 
-------
   104
(1 row)

等超过 10s 再执行第二次。

test=# select count(*) from pg_tables;
psql: FATAL:  terminating connection due to idle-in-transaction timeout
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

检查数据库日志。

2023-02-22 10:47:07.577478 PST,"mxadmin","test",p43086,th-484071296,"[local]",,2023-02-22 10:44:30 PST,0,con25838,cmd8,seg-1,,dx25863,,sx1,"FATAL","25P03","terminating connection due to idle-in-transaction timeout",,,,,,,0,,"postgres.c",4018,

解决方案

谨慎设置该参数，可能会导致长事务回滚。
遇到这种问题，可以修改系统参数或者设置会话级别参数来控制。

系统级别参数修改。

$ gpconfig -c idle_in_transaction_session_timeout -v 10s
$ gpstop -u // 使配置生效

会话级别设置参数。

$ set idle_in_transaction_session_timeout='10s';

16 `insufficient memory reserved for statement` 报错

问题分析

会话使用内存超过 statement_mem 设置内存导致。客户数据量不大，但是分区特别多。过多的分区在 count(*) 的时候会导致内存占用较大。

问题复现

创建一个分区表，分区数量尽可能的多。

test=# create table test (id int,read numeric,write numeric,dttime timestamp)
test-# Distributed by (id)
test-# Partition by range(dttime)
test-# (start ('2022-11-01'::date)
test(# end ('2022-12-01'::date)
test(# every ('1 hours'::interval));
CREATE TABLE

在表中插入少量数据。

test=# insert into test 
test-# select j as id,random()*10 as read,random()*10 as write,i as dttime from generate_series('2022-11-01', '2022-11-30',interval '1 hours') as i,generate_series(1,100) j;
INSERT 0 69800

查询表的数据量报错。

test=# select count(*) from test;
psql: ERROR:  insufficient memory reserved for statement

解决方案

临时解决方案
- 在会话级别修改 statement_mem 参数 set statement_mem ='1000MB'
永久解决方案
- 调整 statement_mem 系统值 gpconfig -c statement_mem -v 10000MB，调整的时候需要注意 max_statement_mem 的大小。
- 如果不想调整系统参数，则需要重新规划分区，当前分区与整表数据量对比是不合理的。

17 `evalplanqual can not hanlde subplan with motion node` 报错

问题分析

不同的事务对相同的分布键数据进行更新操作，最开始的事务提交后，其他事务报该错。

问题复现

创建一张表。

test=# create table test1 as 
test-# select i id,md5(random()::text) tag_id1
test-# from generate_series(1,10) i;
psql: NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'id' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
SELECT 10

会话 A 开启事务更新 id=1 的记录。

test=# begin;
BEGIN
test=# update test1 set id=11 where id=1;
UPDATE 1

会话 B 开启事务更新 id=1 的记录。

test=# begin;
BEGIN
test=# update test1 set id=11 where id=1;

提交会话 A 的事务。

test=# begin;
BEGIN
test=# update test1 set id=11 where id=1;
UPDATE 1
test=# 
test=# commit;
COMMIT

会话 B 报错。

test=# begin;
BEGIN
test=# update test1 set id=11 where id=1;
psql: ERROR:  EvalPlanQual can not handle subPlan with Motion node  (seg1 192.168.8.12:6001 pid=49302)

解决方案

避免多个事务同时对相同的分布键自动进行更新。
更新数据时尽量使用数据库的自动事务功能，无需手动开启事务。
尽量避免在程序中使用长事务。

18 在客户端创建表在 Linux 上查询不到，但是客户端上可以查询到

报错信息

SELECT * FROM ttemp;
psql: ERROR:  relation "ttemp" does not exist
LINE 1: SELECT * FROM ttemp;
                      ^

问题分析

查看报错。

psql: ERROR:  relation "ttemp" does not exist
LINE 1: SELECT * FROM ttemp;
                      ^

执行 \dn 查看 schema 列表，show search_path; 查看当前默认的 Schema ，确认一下这个表创建在哪个 Schema 下面，执行查询加上 schema.table。

postgres=# \dn
   List of schemas
    Name    |  Owner  
------------+---------
 gp_toolkit | mxadmin
 public     | mxadmin
(2 rows)

postgres=# SHOW search_path ;
   search_path   
-----------------
 "$user", public
(1 row)

发现默认的 Schema 就是 public。

执行 \l 查看一下数据库列表。

postgres=# \l
                               List of databases
   Name    |  Owner  | Encoding |  Collate   |   Ctype    |  Access privileges  
-----------+---------+----------+------------+------------+---------------------
 matrixmgr | mxadmin | UTF8     | en_US.utf8 | en_US.utf8 | 
 postgres  | mxadmin | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0 | mxadmin | UTF8     | en_US.utf8 | en_US.utf8 | =c/mxadmin         +
           |         |          |            |            | mxadmin=CTc/mxadmin
 template1 | mxadmin | UTF8     | en_US.utf8 | en_US.utf8 | =c/mxadmin         +
           |         |          |            |            | mxadmin=CTc/mxadmin
(4 rows)

检查一下客户端和程序的连接信息是否正确。

发现程序连接的是 5432 端口的数据库集群，Linux 上面连接的是 5433 端口的。

postgres=# SHOW port;
 port 
------
 5433
(1 row)

解决方案

关闭端口为 5433 的集群。

[mxadmin@mdw ~]$ gpstop -a

修改环境变量的端口和 MASTER_DATA_DIRECTORY 信息。

$ vim ~mxadmin/.matrixdb.env

export PGPORT=5432
export MASTER_DATA_DIRECTORY=/mxdata_20220909145815/master/mxseg-1

加载新的环境变量。

$ source ~mxadmin/.matrixdb.env

19 UUID 分布键插入数据出现数据丢失、数据重复问题

问题描述

执行 INSERT INTO table_uuid SELECT ... FROM 操作，结果会有插入数据条数出现波动的情况。同时 SELECT 结果不出现波动，稳定输出。

给出以下示例：
多次执行 INSERT 操作，发现执行的结果会出现波动，同时 SELECT 结果是稳定的 1 条输出。

INSERT 0 0
Time: 25.774 ms

INSERT 0 1
Time: 40.934 ms

INSERT 0 2
Time: 34.338 ms

INSERT 0 3
Time: 32.562 ms

问题分析

此问题的产生原因是下面三个条件的组合：

SET OPTIMIZER TO ON。
使用 random() 这类的随机函数，生成 UUID。
插入目标表的分布键为 UUID。

解决方案

更改上面三个条件中的其中一项，即可避免此类问题。

20 数据库对应的操作系统版本从 CentOS7 升级到 CentOS8，查询语句性能下降

问题分析

Redhat8 新增 sssd 服务，会对所有连接访问进行认证，导致连接创建认证效率变低，尤其是 slice 数量多的场景，查询语句效率下降明显。

问题复现

在 CentOS7 操作系统上搭建一套集群
在 CentOS8 操作系统上搭建一套集群
创建多张有关联关系的表
进行多表关联查询
对比两个系统的数据库的查询效率

解决方案

将小表改成复制表，减少 slice

关闭对应的操作系统服务

systemctl stop sssd
systemctl stop sssd-kcm.socket

21 执行 SQL 语句时显示 `insufficient memory reserved for statement` 错误

问题分析

在执行 SQL 时查询计划评估的内存超过了 statement_mem 参数的限制。

解决方案

调大 statement_mem 参数值，避免查询时评估的内存超过限制。
查看查询的表是否为分区表，如果是则对分区表进行合并。

22 创建 `matrixts` 扩展时报错 `ERROR: type "mxkv_text" already exist`

问题复现

创建源库 db1、目标库 db2，并在 db1 上创建 matrixts 扩展及若干表。

CREATE DATABASE db1;
CREATE DATABASE db2;
\c db1
CREATE EXTENSION matrixts;
CREATE TABLE t1(c INT);
CREATE TABLE t2(c INT);

使用 Navicat Premium 客户端软件中的工具 → 结构同步，将数据库 db1 的相关表、视图、函数等同步至数据库 db2 中。

此时在数据库 db2 执行 CREATE EXTENSION matrixts ，会出现类似报错：

ERROR:  type "<类型名称>" already exists

ERROR:  function "<函数名称>" already exists with same argument types

例如：

db2=# CREATE EXTENSION matrixts;
psql: NOTICE:  Releasing segworker groups to finish aborting the transaction.
psql: ERROR:  type "mxkv_text" already exists
ERROR:  could not open gp_segment_configutation dump file:gpsegconfig_dump:No such file or directory (cdbutil.c:151)

db2=# CREATE EXTENSION matrixts;
psql: ERROR:  function "set_policy" already exists with same argument types
ERROR:  could not open gp_segment_configutation dump file:gpsegconfig_dump:No such file or directory (cdbutil.c:151)

问题分析

该种问题是由于在通过第三方客户端同步 DDL 的时候，将 matrixts 扩展所需的函数、类型等定义同步了过来。

当用户通过 CREATE EXTENSION matrixts 创建 matrixts 插件时，由于存在同名函数、类型等定义，所以会导致 matrixts 插件无法初始化。

解决方案

如果你已因操作不当收到报错，请删除冲突的同名函数及类型。

   DROP TYPE mxkv_float4 cascade;
   DROP TYPE mxkv_float8 cascade;
   DROP TYPE mxkv_int4 cascade;
   DROP TYPE mxkv_text cascade;
   DROP FUNCTION set_policy;
   DROP FUNCTION drop_policy(rel regclass);
   DROP FUNCTION drop_policy(rel regclass, policy_name text);
   DROP FUNCTION set_policy_action(rel regclass, action_name text, args text);
   DROP FUNCTION set_policy_action(rel regclass, action_name text, in_disabled boolean);
   DROP FUNCTION disable_policy_action(rel regclass, action_name text);
   DROP FUNCTION enable_policy_action(rel regclass, action_name text);

如果你还未在客户端进行同步操作，则可创建新的数据库，在该数据库下创建 matrixts 插件后，再进行表、视图等定义的同步。

YMatrix SQL 查询常见问题

1 ERROR: EvalPlanQual can not handle subPlan with Motion node

2 ERROR: Too many unresolved insertion xids, please do a vacuum

3 ERROR: modification of distribution columns in OnConflictUpdate is not supported

4 图形化客户端收到 log：server closed the connection unexpectedly

5 PARTITION 表简单 Filter 操作的 UNION ALL 查询比 IN 查询慢

6 插入 int 类型数据，单独查询跑很快，但放到 Plpgsql Function 里很慢

7 PARTITION 分区裁剪更新操作，两个会话（Session）独立更新会导致互锁

8 自定义 type 使用及单表字段扩展

9 ERROR,XX000,Right sibling's left-link doesn't match : block 817 links to 45366 instead of expected 70930 in index ""tab_idx""

10 更新删除时报错 cannot delete from table "&tablename" because it does not have a replica identity and publishes deletes

11 ERROR: could not read block 0 in file "base/1588803/269422"

12 行转列示例

13 列转行示例

14 pg_hba.conf 文件内容配置的顺序导致的远程访问权限异常

15 idle in transaction timeout 报错

16 insufficient memory reserved for statement 报错

17 evalplanqual can not hanlde subplan with motion node 报错

18 在客户端创建表在 Linux 上查询不到，但是客户端上可以查询到

19 UUID 分布键插入数据出现数据丢失、数据重复问题

20 数据库对应的操作系统版本从 CentOS7 升级到 CentOS8，查询语句性能下降

21 执行 SQL 语句时显示 insufficient memory reserved for statement 错误

22 创建 matrixts 扩展时报错 ERROR: type "mxkv_text" already exist

1 `ERROR: EvalPlanQual can not handle subPlan with Motion node`

2 `ERROR: Too many unresolved insertion xids, please do a vacuum`

3 `ERROR: modification of distribution columns in OnConflictUpdate is not supported`

4 图形化客户端收到 log：`server closed the connection unexpectedly`

10 更新删除时报错 `cannot delete from table "&tablename" because it does not have a replica identity and publishes deletes`

11 `ERROR: could not read block 0 in file "base/1588803/269422"`

14 `pg_hba.conf` 文件内容配置的顺序导致的远程访问权限异常

15 `idle in transaction timeout` 报错

16 `insufficient memory reserved for statement` 报错

17 `evalplanqual can not hanlde subplan with motion node` 报错

21 执行 SQL 语句时显示 `insufficient memory reserved for statement` 错误

22 创建 `matrixts` 扩展时报错 `ERROR: type "mxkv_text" already exist`