- Hands-On Graph Analytics with Neo4j
- Estelle Scifo
- 591字
- 2025-04-04 12:48:21
Using aggregation functions
It is often very useful to compute some aggregated quantities for the entities in our database, such as the number of friends in a social graph or the total price of an order for an e-commerce website. We will discover here how to do those calculations with Cypher.
Count, sum, and average
In a similar way to SQL, you can compute aggregates with Cypher. The main difference with SQL is that there is no need to use a GROUP BY statement; all fields that are not in an aggregation function will be used to create groups:
MATCH (FL:State {code: "FL"})-[:SHARE_BORDER_WITH]-(n)
RETURN FL.name as state_name, COUNT(n.code) as number_of_neighbors
The result is the following one, as expected:
╒════════════╤═════════════════════╕
│"state_name"│"number_of_neighbors"│
╞════════════╪═════════════════════╡
│"Florida" │2 │
└────────────┴─────────────────────┘
The following aggregate functions are available:
- AVG(expr): available for numeric values and durations
- COUNT(expr): the number of rows with non-null expr
- MAX(expr): the maximum value of expr over the group
- MIN(expr): the minimum value of expr over the group
- percentileCont(expr, p): the p (percentage) of expr over the group, interpolated
- percentileDisc(expr, p): the p (percentage) of expr over the group
- stDev(expr): the standard deviation of expr over the group
- stDevP(expr): the population standard deviation of expr over the group
- SUM(expr): available for numeric values and duration
- COLLECT(expr): see the next section
For instance, we can compute the ratio between a state population and the sum of all people living in its neighboring states like so:
MATCH (s:State)-[:SHARE_BORDER_WITH]-(n)
WITH s.name as state, toFloat(SUM(n.population)) as neighbor_population, s.population as pop
RETURN state, pop, neighbor_population, pop / neighbor_population as f
ORDER BY f desc
The WITH keyword is used to perform intermediate operations.
Creating a list of objects
It is sometimes useful to aggregate several rows into a single list of objects. In that case, we will use the following:
COLLECT
For instance, if we want to create a list containing the code of the states sharing a border with Colorado:
MATCH (:State {code: "FL"})-[:SHARE_BORDER_WITH]-(n)
RETURN COLLECT(n.code)
This returns the following result:
["GA","AL"]
Unnesting objects
Unnesting consists of converting a list of objects into rows, each row containing an item of the list. It is the exact opposite of COLLECT, which groups objects together into a list.
With Cypher, we will use the following statement:
UNWIND
For instance, the following two queries are equivalent:
MATCH (:State {code: "FL"})-[:SHARE_BORDER_WITH]-(n)
WITH COLLECT(n.code) as codes
UNWIND codes as c
RETURN c
// is equivalent to, since COLLECT and UNWIND cancel each other:
MATCH (CO:State {code: "FL"})-[:SHARE_BORDER_WITH]-(n)
RETURN n.cod
This returns our well-known two state codes.
The UNWIND operation will be useful for data imports, since some files are formatted in a way that several pieces of information can be aggregated on a single row. Depending on the data format, this function can be useful when importing data into Neo4j, as we will see in the next section.