Aggregation

Introduction

Aggregation functions — aggr(expr) — operate over all rows matching a given aggregation key. Keys are compared for equivalence.

In standard aggregation (i.e., aggr(expr)), the list of values passed to the aggregate is derived from the candidate values after removing all NULLs.

Note!
In YMatrix AGE, aggregation results are not ordered. To obtain an ordered result set from aggregated data, apply ORDER BY outside the cypher() function — in the outer SQL query. An ORDER BY inside the Cypher clause has no effect on global ordering; it only guarantees ordering per segment.
Example:
SELECT * FROM cypher('graph_name', $$ /* aggr(expr) */ $$) AS (grouping_key agtype) ORDER BY grouping_key;

Data Preparation

SELECT * FROM cypher('graph_name', $$
    CREATE (a:Person {name: 'A', age: 13}),
           (b:Person {name: 'B', age: 33, eyes: "blue"}),
           (c:Person {name: 'C', age: 44, eyes: "blue"}),
           (d1:Person {name: 'D', eyes: "brown"}),
           (d2:Person {name: 'D'}),
           (a)-[:KNOWS]->(b),
           (a)-[:KNOWS]->(c),
           (a)-[:KNOWS]->(d1),
           (b)-[:KNOWS]->(d2),
           (c)-[:KNOWS]->(d2)
$$) AS (a agtype);

Automatic Grouping (Auto-GROUP BY)

Cypher provides aggregation capabilities analogous to SQL’s GROUP BY. Aggregation functions accept a set of values and compute a single summary value — for example, avg() computes the arithmetic mean of numeric values, and min() returns the smallest numeric or lexicographic value in the set.

When we say an aggregation function operates “over a set of values”, we mean the inner expression (e.g., n.age) is evaluated for every record within the same aggregation group.

Aggregations may be computed over the entire matched result set, or further subdivided using grouping keys. Grouping keys are non-aggregate expressions used to partition input rows before applying the aggregation function.

Consider the following RETURN statement:

SELECT * FROM cypher('graph_name', $$
    MATCH (v:Person)
    RETURN v.name, count(*)
$$) AS (grouping_key agtype, count agtype);
count key
"A" 1
"B" 1
"C" 1
"D" 2
(1 row)

There are two return expressions: v.name and count(*). The first is not an aggregate, so it becomes the implicit grouping key. The second is an aggregate. Matched records are grouped by the value of v.name, and count(*) is computed once per group.

DISTINCT Aggregation

In DISTINCT aggregation (i.e., aggr(DISTINCT expr)), the list of candidate values is first filtered to remove all NULLs. Then, among equivalent values, only one instance is retained — duplicates (under equivalence) are eliminated.

The DISTINCT operator is used with aggregations to ensure uniqueness before the aggregation function processes the values.

SELECT *
FROM cypher('graph_name', $$
    MATCH (v:Person)
    RETURN count(DISTINCT v.eyes), count(v.eyes)
$$) AS (distinct_eyes agtype, eyes agtype);
distinct_eyes eyes
2 3
(1 row)

Ambiguous Grouping Statements

Cypher’s automatic grouping behavior — where users need not explicitly declare grouping keys — can lead to ambiguity.

Data Preparation

SELECT * FROM cypher('graph_name', $$
    CREATE (:L {a: 1, b: 2, c: 3}),
           (:L {a: 2, b: 3, c: 1}),
           (:L {a: 3, b: 1, c: 2})
$$) AS (a agtype);

Invalid Query in AGE

AGE resolves ambiguity by requiring that any variable referenced outside an aggregate function must either:

  • appear explicitly as a grouping key in the same WITH or RETURN clause, or
  • be fully enclosed within an aggregate expression.

Query

SELECT * FROM cypher('graph_name', $$
    MATCH (x:L)
    RETURN x.a + count(*) + x.b + count(*) + x.c
$$) AS (a agtype);

Result

ERROR:  "x" must be either part of an explicitly listed key or used inside an aggregate function
LINE 3: RETURN x.a + count(*) + x.b + count(*) + x.c

Valid Queries in AGE

In AGE, any non-aggregate column in a WITH or RETURN clause is treated as a grouping key.

Option 1: Combine non-aggregate terms into a single grouping expression

SELECT * FROM cypher('graph_name', $$
    MATCH (x:L)
    RETURN (x.a + x.b + x.c) + count(*) + count(*), x.a + x.b + x.c
$$) AS (count agtype, key agtype);

Here, x.a + x.b + x.c serves as the grouping key. Parentheses are required to form a compound key.

count key
12 6
(1 row)

Option 2: List each variable separately as a grouping key

SELECT * FROM cypher('graph_name', $$
    MATCH (x:L)
    RETURN x.a + count(*) + x.b + count(*) + x.c, x.a, x.b, x.c
$$) AS (count agtype, a agtype, b agtype, c agtype);

Here, x.a, x.b, and x.c are each treated as independent grouping keys.

count a b c
8 3 1 2
8 2 3 1
8 1 2 3
(3 rows)

Vertices and Edges as Grouping Keys

A vertex or edge itself may serve as a grouping key. When doing so, any of its properties may be referenced directly in the same WITH or RETURN clause without explicit declaration.

SELECT * FROM cypher('graph_name', $$
    MATCH (x:L)
    RETURN count(*) + count(*) + x.a + x.b + x.c, x
$$) AS (count agtype, key agtype);

The result is grouped by x. Because vertices are identity-based, their properties need not be enumerated for unambiguous grouping.

Hiding Unwanted Grouping Keys

If a grouping key is needed for aggregation but should not appear in the final output, perform the aggregation in a WITH clause and pass only the desired result to RETURN.

SELECT * FROM cypher('graph_name', $$
    MATCH (x:L)
    WITH count(*) + count(*) + x.a + x.b + x.c AS column, x
    RETURN column
$$) AS (a agtype);
a
8
8
8
(3 rows)