Using aggregation functions

It is often very useful to compute some aggregated quantities for the entities in our database, such as the number of friends in a social graph or the total price of an order for an e-commerce website. We will discover here how to do those calculations with Cypher.

Count, sum, and average

In a similar way to SQL, you can compute aggregates with Cypher. The main difference with SQL is that there is no need to use a GROUP BY statement; all fields that are not in an aggregation function will be used to create groups:

MATCH (FL:State {code: "FL"})-[:SHARE_BORDER_WITH]-(n)
RETURN FL.name as state_name, COUNT(n.code) as number_of_neighbors

The result is the following one, as expected:

╒════════════╤═════════════════════╕
│"state_name"│"number_of_neighbors"│
╞════════════╪═════════════════════╡
│"Florida" │2 │
└────────────┴─────────────────────┘

The following aggregate functions are available:

  • AVG(expr): available for numeric values and durations
  • COUNT(expr): the number of rows with non-null expr
  • MAX(expr): the maximum value of expr over the group
  • MIN(expr): the minimum value of expr over the group
  • percentileCont(expr, p): the p (percentage) of expr over the group, interpolated
  • percentileDisc(expr, p): the p (percentage) of expr over the group
  • stDev(expr): the standard deviation of expr over the group
  • stDevP(expr): the population standard deviation of expr over the group
  • SUM(expr): available for numeric values and duration
  • COLLECT(expr): see the next section

For instance, we can compute the ratio between a state population and the sum of all people living in its neighboring states like so:

MATCH (s:State)-[:SHARE_BORDER_WITH]-(n)
WITH s.name as state, toFloat(SUM(n.population)) as neighbor_population, s.population as pop
RETURN state, pop, neighbor_population, pop / neighbor_population as f
ORDER BY f desc

The WITH keyword is used to perform intermediate operations.

Creating a list of objects

It is sometimes useful to aggregate several rows into a single list of objects. In that case, we will use the following:

COLLECT

For instance, if we want to create a list containing the code of the states sharing a border with Colorado:

MATCH (:State {code: "FL"})-[:SHARE_BORDER_WITH]-(n)
RETURN COLLECT(n.code)

This returns the following result:

["GA","AL"]

Unnesting objects

Unnesting consists of converting a list of objects into rows, each row containing an item of the list. It is the exact opposite of COLLECT, which groups objects together into a list.

With Cypher, we will use the following statement:

UNWIND

For instance, the following two queries are equivalent:

MATCH (:State {code: "FL"})-[:SHARE_BORDER_WITH]-(n)
WITH COLLECT(n.code) as codes
UNWIND codes as c
RETURN c

// is equivalent to, since COLLECT and UNWIND cancel each other:
MATCH (CO:State {code: "FL"})-[:SHARE_BORDER_WITH]-(n)
RETURN n.cod

This returns our well-known two state codes.

The UNWIND operation will be useful for data imports, since some files are formatted in a way that several pieces of information can be aggregated on a single row. Depending on the data format, this function can be useful when importing data into Neo4j, as we will see in the next section.