Counting and Grouping¶
Part 3: Exploring Your Knowledge Graph | Chapter 2 of 6
Prerequisites: Chapter 3.1: What Do I Actually Have? Time: ~10 minutes
Quick Recap¶
Last chapter, you poked around with MATCH and RETURN, discovered 47 concepts, and learned that only 32 of them actually have edges. You also found 15 orphans sitting in the corner with no connections at all.
But you still don't know what kinds of relationships are in this graph. Let's fix that.
83 Edges, But What Kind?¶
You know you have 83 edges. But what kinds? Is it mostly REQUIRES? CONTRADICTS? SIMILAR_TO? Let's find out.
MATCH ()-[r]->()
RETURN type(r) AS relation_type, count(r) AS count
ORDER BY count DESC
Result:
| relation_type | count |
|---|---|
| REQUIRES | 35 |
| IMPLEMENTS | 18 |
| REFINES | 14 |
| CHALLENGES | 12 |
| CONTRADICTS | 4 |
Read that again. REQUIRES is 35 out of 83 edges. That's 42% of your entire graph. CONTRADICTS is only 4.
Your graph is mostly about dependencies, not conflicts. The book chapter on encapsulation isn't really about things fighting each other. It's about things needing each other. Properties require Encapsulation. Getters require Properties. The whole structure is a dependency tree with a few disagreements sprinkled in.
Falsifiable Claim
REQUIRES is the dominant relation type, making up more than 30% of all edges. Run the query above on your own graph to verify.
Wait, Where's the GROUP BY?¶
If you've written SQL, you just noticed something weird. There's no GROUP BY clause.
Cypher doesn't need one. Any non-aggregated column in your RETURN automatically becomes a grouping key. When you wrote RETURN type(r), count(r), Cypher grouped by type(r) for you.
This is called implicit grouping. It's one of the nicer things about Cypher compared to SQL, where forgetting GROUP BY gives you a syntax error and a headache.
Counting Concepts by Domain¶
Edges aren't the only thing worth counting. What about the concepts themselves?
MATCH (c:Concept)
RETURN c.domain, count(c) AS concept_count
ORDER BY concept_count DESC
Result:
| domain | concept_count |
|---|---|
| implementation_hiding | 18 |
| access_control | 12 |
| state_management | 9 |
| design_patterns | 5 |
| error_handling | 3 |

Most of your concepts live in implementation_hiding and access_control. Makes sense: this is the Encapsulation chapter. But 3 concepts in error_handling? That's worth investigating later.
ORDER BY: Sorting Your Results¶
You've already been using ORDER BY count DESC in the queries above. A few things to know:
DESCsorts high to low.ASC(or just leaving it off) sorts low to high.- You can sort by any column in your
RETURN. - You can chain sorts:
ORDER BY count DESC, relation_type ASC.
Here's the edge count again, but ascending (least common first):
MATCH ()-[r]->()
RETURN type(r) AS relation_type, count(r) AS count
ORDER BY count ASC
CONTRADICTS floats to the top at 4. The rarest relationship. Only four pairs of concepts in this chapter genuinely conflict with each other.
The Word You Just Earned¶
Everything you've done in this chapter (counting things, grouping them, sorting the results) has a name: aggregation.
Count, sum, average, collect into a list: these are all aggregation functions. Cypher's main ones:
| Function | What It Does |
|---|---|
count(x) |
How many |
sum(x) |
Add them up |
avg(x) |
Average |
min(x) |
Smallest value |
max(x) |
Largest value |
collect(x) |
Gather into a list |
You'll use count() constantly. The others show up when you start working with numeric properties like confidence scores. For now, count() and collect() are your workhorses.
Exercise: Who's Connected to What?¶
Combine what you learned in Chapter 3.1 (MATCH patterns) with what you learned here (counting) to answer this question:
How many outgoing edges does each concept have? Show only concepts with 3 or more, sorted highest first.
Solution
MATCH (c:Concept)-[r]->()
RETURN c.name, count(r) AS outgoing
ORDER BY outgoing DESC
Then eyeball the results for anything with 3+ outgoing edges. If you want to filter strictly:
MATCH (c:Concept)-[r]->()
WITH c.name AS name, count(r) AS outgoing
WHERE outgoing >= 3
RETURN name, outgoing
ORDER BY outgoing DESC
Notice the WITH clause? It lets you filter on the aggregated value. You can't put WHERE count(r) >= 3 directly because WHERE runs before aggregation. WITH passes intermediate results forward so you can filter after counting.
Keep WITH in your back pocket. You'll need it.
What You Learned¶
count()and implicit grouping: Return a non-aggregated column alongsidecount()and Cypher groups automaticallyORDER BY: Sort results ascending or descending- Aggregation: The general name for counting, summing, and collecting, which you now get to use because you've earned it
- Your graph's shape: Mostly
REQUIRESedges (42%), very fewCONTRADICTS(5%). Dependencies dominate conflicts.
Next Up¶
Numbers are nice, but you want to see the connections. Which concepts does Dangerous Setters actually point to? What points back? Let's follow an edge.