Aggregate Functions

Functions that trigger automatic aggregation.

Data Preparation

LOAD 'age';
SET search_path TO ag_catalog;

SELECT create_graph('graph_name');

SELECT * FROM cypher('graph_name', $$
        CREATE (a:Person {name: 'A', age: 13}),
        (b:Person {name: 'B', age: 33, eyes: "blue"}),
        (c:Person {name: 'C', age: 44, eyes: "blue"}),
        (d1:Person {name: 'D', eyes: "brown"}),
        (d2:Person {name: 'D'}),
        (a)-[:KNOWS]->(b),
        (a)-[:KNOWS]->(c),
        (a)-[:KNOWS]->(d1),
        (b)-[:KNOWS]->(d2),
        (c)-[:KNOWS]->(d2)
$$) as (a agtype);

min()

min() returns the smallest value from a set of values.

Syntax: min(expression)

Returns:

An agtype value — either a scalar or a list — depending on the result of `expression`.

Parameters:

Name Description
expression An expression returning a collection of any agtype scalars and/or lists.

Notes:

  • Null values are excluded from the calculation.
  • In mixed-type collections:
    • Any string is considered less than any number.
    • Any list is considered less than any string.
  • Lists are compared lexicographically: elements are compared pairwise from left to right in ascending order.
  • min(null) returns null.

Query:

SELECT *
FROM cypher('graph_name', $$
    MATCH (v:Person)
    RETURN min(v.age)
$$) as (min_age agtype);

Returns the minimum value among all age property values.

Result:

min_age
13
1 row

Using min() with Lists

Data preparation:
To illustrate the following example, run these three commands first:

SELECT * FROM cypher('graph_name', $$
    CREATE (:min_test {val:'d'})
$$) as (result agtype);

SELECT * FROM cypher('graph_name', $$
    CREATE (:min_test {val:['a', 'b', 23]})
$$) as (result agtype);

SELECT * FROM cypher('graph_name', $$
    CREATE (:min_test {val:[1, 'b', 23]})
$$) as (result agtype);

Query:

SELECT *
FROM cypher('graph_name', $$
    MATCH (v:min_test)
    RETURN min(v.val)
$$) as (min_val agtype);

Returns the smallest value in the collection — here, the list ['a', 'b', 23] — because:
(i) both lists compare lower than the string 'd', and
(ii) the string 'a' compares lower than the number 1.

Result:

min_val
["a", "b", 23]
1 row

max()

max() returns the largest value from a set of values.

Syntax: max(expression)

Returns:

An agtype value — either a scalar or a list — depending on the result of `expression`.

Parameters:

Name Description
expression An expression returning a collection of any agtype scalars and/or lists.

Notes:

  • Null values are excluded from the calculation.
  • In mixed-type collections:
    • Any number is considered greater than any string.
    • Any string is considered greater than any list.
  • Lists are compared lexicographically: elements are compared pairwise from left to right in ascending order.
  • max(null) returns null.

Query:

SELECT *
FROM cypher('graph_name', $$
    MATCH (n:Person)
    RETURN max(n.age)
$$) as (max_age agtype);

Returns the maximum value among all age property values.

Result:

max_age
44
1 row

stDev()

stDev() returns the sample standard deviation of a set of values. It uses the standard two-pass algorithm with denominator N − 1, suitable for unbiased estimation from a sample of a population. Use stDevP() when computing the standard deviation of an entire population.

Syntax: stDev(expression)

Returns:

An agtype floating-point number.

Parameters:

Name Description
expression An agtype numeric expression

Notes:

  • Null values are excluded from the calculation.
  • stDev(null) returns 0.0.

Query:

SELECT *
FROM cypher('graph_name', $$
   MATCH (n:Person)
   RETURN stDev(n.age)
$$) as (stdev_age agtype);

Returns the sample standard deviation of the age property values.

Result:

stdev_age
15.716233645501712
1 row

stDevP()

stDevP() returns the population standard deviation of a set of values. It uses the standard two-pass algorithm with denominator N, suitable for computing the standard deviation of an entire population. Use stDev() when computing the standard deviation of a sample drawn from a population.

Syntax: stDevP(expression)

Returns:

An agtype floating-point number.

Parameters:

Name Description
expression An agtype numeric expression

Notes:

  • Null values are excluded from the calculation.
  • stDevP(null) returns 0.0.

Query:

SELECT *
FROM cypher('graph_name', $$
    MATCH (n:Person)
    RETURN stDevP(n.age)
$$) as (stdevp_age agtype);

Returns the population standard deviation of the age property values.

Result:

stdevp_age
12.832251036613439
1 row

percentileCont()

percentileCont() returns the continuous percentile of a set of values, where the percentile argument ranges from 0.0 to 1.0. It uses linear interpolation: if the requested percentile falls between two values, it computes a weighted average. For nearest-value rounding, use percentileDisc().

Syntax: percentileCont(expression, percentile)

Returns:

An agtype floating-point number.

Parameters:

Name Description
expression An agtype numeric expression
percentile An agtype numeric value between 0.0 and 1.0

Notes:

  • Null values are excluded from the calculation.
  • percentileCont(null, percentile) returns null.

Query:

SELECT *
FROM cypher('graph_name', $$
    MATCH (n:Person)
    RETURN percentileCont(n.age, 0.4)
$$) as (percentile_cont_age agtype);

Returns the 40th percentile of the age property values, computed using linear interpolation.

Result:

percentile_cont_age
29.0
1 row

percentileDisc()

percentileDisc() returns the discrete percentile of a set of values, where the percentile argument ranges from 0.0 to 1.0. It uses rounding to return the value closest to the requested percentile. For interpolated results, use percentileCont().

Syntax: percentileDisc(expression, percentile)

Returns:

An agtype floating-point number.

Parameters:

Name Description
expression An agtype numeric expression
percentile An agtype numeric value between 0.0 and 1.0

Notes:

  • Null values are excluded from the calculation.
  • percentileDisc(null, percentile) returns null.

Query:

SELECT *
FROM cypher('graph_name', $$
    MATCH (n:Person)
    RETURN percentileDisc(n.age, 0.5)
$$) as (percentile_disc_age agtype);

Returns the 50th percentile (median) of the age property values.

Result:

percentile_disc_age
33.0
1 row

count()

count() returns the number of values or records. It has two variants:

  • count(*) returns the number of matching records.
  • count(expr) returns the number of non-null values returned by expr.

Syntax: count(expression)

Returns:

An agtype integer.

Parameters:

Name Description
expression Any expression.

Notes:

  • count(*) includes records that would produce null.
  • count(expr) excludes null values.
  • count(null) returns 0.
  • count(*) can be used to count nodes; for example, to count nodes connected to a given node n.

Query:

SELECT *
FROM cypher('graph_name', $$
    MATCH (n {name: 'A'})-[]->(x)
    RETURN n.age, count(*)
$$) as (age agtype, number_of_people agtype);

Returns the age property of the starting node n (where name = 'A') and the number of nodes connected to n.

Result:

age number_of_people
13 3
1 row

count(*) can also group and count relationship types, returning the count per type.

Query:

SELECT *
FROM cypher('graph_name', $$
    MATCH (n {name: 'A'})-[r]->()
    RETURN type(r), count(*)
$$) as (label agtype, count agtype);

Returns each relationship type and its count.

Result:

label count
"KNOWS" 3
1 row

Counting Values with count(expression)

Counting actual non-null values returned by an expression may be more useful than simply counting records with count(*).

Query:

SELECT *
FROM cypher('graph_name', $$
    MATCH (n {name: 'A'})-[]->(x)
    RETURN count(x)
$$) as (count agtype);

Returns the number of nodes connected to the starting node n.

Result:

count
3
1 row

Counting Non-Null Values

count(expression) counts only non-null values returned by the expression.

Query:

SELECT *
FROM cypher('graph_name', $$
    MATCH (n:Person)
    RETURN count(n.age)
$$) as (count agtype);

Returns the number of Person nodes with a non-null age property.

Result:

count
3
1 row

Distinct vs. Non-Distinct Counting

In this example, we find all friends-of-friends and count them:

  • count(DISTINCT friend_of_friend) counts each friend_of_friend only once, because DISTINCT removes duplicates.
  • count(friend_of_friend) counts duplicate friend_of_friend values multiple times.

Query:

SELECT *
FROM cypher('graph_name', $$
        MATCH (me:Person)-[]->(friend:Person)-[]->(friend_of_friend:Person)
        WHERE me.name = 'A'
        RETURN count(DISTINCT friend_of_friend), count(friend_of_friend)
$$) as (friend_of_friends_distinct agtype, friend_of_friends agtype);

Both B and C know D, so D is counted twice without DISTINCT.

Result:

friend_of_friends_distinct friend_of_friends
1 2
1 row

avg()

avg() returns the arithmetic mean of a set of numeric values.

Syntax: avg(expression)

Returns:

An agtype integer or floating-point number.

Parameters:

Name Description
expression An expression returning a set of numbers.

Notes:

  • Null values are excluded from the calculation.
  • avg(null) returns null.

Query:

SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN avg(n.age)
$$) as (avg_age agtype);

Returns the average of all age property values.

Result:

avg_age
30.0
1 row

sum()

sum() returns the sum of a set of numeric values.

Syntax: sum(expression)

Returns:

An agtype floating-point number.

Parameters:

Name Description
expression An expression returning a set of numbers.

Notes:

  • Null values are excluded from the calculation.
  • sum(null) returns null.

Query:

SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN sum(n.age)
$$) as (total_age agtype);

Returns the sum of all age property values.

Result:

total_age
90
1 row