Data Types

This document describes the data types supported by YMatrix.

YMatrix supports the following categories of data types:

  • Basic data types
    1. Numeric
    2. String
    3. Date and time
  • Extended data types
    1. JSON / jsonb
    2. MXKV
  • Spatial data types

1 Basic Data Types

1.1 Numeric

Type Alias Size Range Description
smallint int2 2 bytes -32768 ~ 32767 Small-range integer
int int4 4 bytes -2147483648 ~ 2147483647 Integer type
bigint int8 8 bytes -9223372036854775808 ~ 9223372036854775807 Large-range integer
serial serial4 4 bytes 1 ~ 2147483647 Auto-incrementing integer
bigserial serial8 8 bytes 1 ~ 9223372036854775807 Large-range auto-incrementing integer
real float4 4 bytes 6-digit precision Variable precision
double precision float8 8 bytes 15-digit precision Variable precision
decimal numeric variable unlimited Fixed precision

As shown in the table, YMatrix numeric types fall into four categories:

  • Integers: Different sizes provide different ranges; all are signed
    1. smallint
    2. int
    3. bigint
  • Fixed-point numbers: Numbers with fixed decimal precision
    1. decimal
  • Floating-point numbers: Numbers with variable decimal precision
    1. real
    2. double precision
  • Auto-increment types: Used for auto-incrementing ID columns
    1. serial
    2. bigserial

1.2 String

Type Alias Size Range Description
character [ (n) ] char [ (n) ] 1 byte + n Up to n characters Fixed-length string, padded with spaces if not fully filled
character varying [ (n) ] varchar [ (n) ] 1 byte + string size Up to n characters Variable-length string with length limit; maximum 10,485,760 bytes
text 1 byte + string length Unlimited length Variable-length string without length limit

As shown in the table, YMatrix string types fall into two categories:

  • Fixed-length: char
  • Variable-length
    1. varchar: Length-limited
    2. text: Unlimited length

1.3 Date and Time

Type Alias Size Range Description
date 4 bytes 4713 BC ~ 294,277 AD Calendar date (year, month, day)
time [ (p) ] [ without time zone ] 8 bytes 00:00:00[.000000] ~ 24:00:00[.000000] Time of day
time [ (p) ] with time zone timetz 12 bytes 00:00:00+1359 ~ 24:00:00-1359 Time of day with time zone
timestamp [ (p) ] [ without time zone ] 8 bytes 4713 BC ~ 294,277 AD Date and time
timestamp [ (p) ] with time zone timestamptz 8 bytes 4713 BC ~ 294,277 AD Date and time with time zone

As shown in the table, YMatrix date/time types fall into three categories:

  • Date: Represents only year-month-day, no time
    1. date
  • Time: Represents only time, no date
    1. time [without time zone] (without time zone)
    2. time with time zone (with time zone)
  • Timestamp: Represents both date and time
    1. timestamp [without time zone] (without time zone)
    2. timestamp with time zone (with time zone)

Impact of Time Zone on Time Types

A notable feature in time types is the time zone setting. By default, time zone is not included (i.e., without time zone). What impact does time zone have on time types? The following example demonstrates this behavior.

CREATE TABLE test (
    c1 int,
    c2 timestamp,
    c3 timestamp with time zone
) DISTRIBUTED BY (c1);

The test table contains two timestamp columns: one without time zone and one with time zone. Insert test data and query:

INSERT INTO test VALUES(1, now(), now());

SELECT * FROM test;
 c1 |            c2            |             c3
----+--------------------------+-----------------------------
  1 | 2021-12-01 14:54:09.4783 | 2021-12-01 14:54:09.4783+08
(1 row)

SHOW timezone;
   TimeZone
---------------
 Asia/Shanghai
(1 row)

The query result shows that both timestamps appear identical. However, the c3 column (with time zone) includes +08, indicating it belongs to UTC+8. The SHOW timezone command confirms the current session time zone is 'Asia/Shanghai' (Beijing time).

Now change the time zone and requery:

SET timezone TO 'Japan';

SELECT * FROM test;
 c1 |            c2            |             c3
----+--------------------------+-----------------------------
  1 | 2021-12-01 14:54:09.4783 | 2021-12-01 15:54:09.4783+09
(1 row)

After changing the database time zone to Japan, the c3 column now displays a different time (15:54) and the time zone suffix changes to +09 (UTC+9).

Therefore, timestamp columns with time zone store time zone information and display local time accordingly in different time zones. In contrast, timestamp columns without time zone always display the stored time unchanged, regardless of session time zone.

When deploying systems across multiple geographic regions with different time zones, using timestamp with time zone is strongly recommended. Note that both timestamp variants use the same storage size, so including time zone information is advised.

Recommendations for Basic Data Types in Time-Series Scenarios

Time-series data typically includes:

  • Timestamps: Must include both date and time. Use timestamp with time zone.
  • Device identifiers:
    1. For string-based identifiers, use varchar or text. Use char only for fixed-length identifiers such as serial numbers.
    2. If maintaining a separate device table, use serial or bigserial as auto-incrementing primary keys to uniquely identify devices.
  • Metrics:
    1. For integer values, choose smallint, int, or bigint based on expected range.
    2. For decimal values with fixed precision and moderate range (e.g., temperature), use decimal.
    3. For decimal values with variable precision and large range, choose real or double precision based on required accuracy.

2 Extended Data Types

In relational models, basic data types represent one attribute per column (e.g., name, height, weight). However, two challenges arise:

  1. The set of metrics is unpredictable, requiring frequent schema changes.
  2. The number of metrics exceeds YMatrix's column limit (1600 columns per table).

In such cases, extended data types are needed. These allow storing multiple attributes within a single column.

There are many ways to implement extended types (e.g., custom serialization in string columns). YMatrix provides two built-in extended types:

  1. JSON / jsonb
  2. MXKV

2.1 JSON / jsonb

The JSON / jsonb types are inherited from PostgreSQL. Unlike storing raw JSON strings in text columns, JSON / jsonb types support built-in functions for accessing keys, merging, and deletion.

JSON and jsonb are functionally similar and can be converted between each other. Key differences:

  1. JSON is stored as text; jsonb is stored in binary format.
  2. JSON writes are faster (no binary conversion).
  3. jsonb reads are faster due to binary representation.

Common JSON / jsonb operations:

Extract key using ->

SELECT '{"a":1, "b":2, "c":3}'::json->'a';
 ?column?
----------
 1
(1 row)

For nested JSON, chain multiple -> operators:

SELECT '{"a":1, "b":2, "c":{"d":2}}'::json->'c'->'d';
 ?column?
----------
 2
(1 row)

Delete key

Use the - operator to delete a key. Write operations require jsonb:

SELECT '{"a":1, "b":2, "c":3}'::jsonb - 'a';
     ?column?
------------------
 {"b": 2, "c": 3}
(1 row)

Merge

Use the || operator to merge two jsonb objects:

SELECT '{"a":1, "b":2}'::jsonb || '{"c":3}'::jsonb;
         ?column?
--------------------------
 {"a": 1, "b": 2, "c": 3}
(1 row)

Update

When merging with duplicate keys, || updates the value:

SELECT '{"a":1, "b":2}'::jsonb || '{"b":3}'::jsonb;
     ?column?
------------------
 {"a": 1, "b": 3}
(1 row)

For more JSON functions, see the PostgreSQL documentation.

2.2 MXKV

MXKV is a high-performance, scalable storage type developed by YMatrix. It is used similarly to JSON but offers key advantages: binary storage with compression, resulting in smaller storage footprint and higher query performance.

For detailed usage, refer to the documentation and blog.

3 Spatial Data Types

YMatrix provides an enhanced version of the PostGIS extension, supporting storage and computation of spatial data types. For usage details, see Spatial Data.