Counting and Grouping

Part 3: Exploring Your Knowledge Graph | Chapter 2 of 6

Prerequisites: Chapter 3.1: What Do I Actually Have? Time: ~10 minutes


Quick Recap

Last chapter, you poked around with MATCH and RETURN, discovered 47 concepts, and learned that only 32 of them actually have edges. You also found 15 orphans sitting in the corner with no connections at all.

But you still don't know what kinds of relationships are in this graph. Let's fix that.


83 Edges, But What Kind?

You know you have 83 edges. But what kinds? Is it mostly REQUIRES? CONTRADICTS? SIMILAR_TO? Let's find out.

MATCH ()-[r]->()
RETURN type(r) AS relation_type, count(r) AS count
ORDER BY count DESC

Result:

relation_type count
REQUIRES 35
IMPLEMENTS 18
REFINES 14
CHALLENGES 12
CONTRADICTS 4

Read that again. REQUIRES is 35 out of 83 edges. That's 42% of your entire graph. CONTRADICTS is only 4.

Your graph is mostly about dependencies, not conflicts. The book chapter on encapsulation isn't really about things fighting each other. It's about things needing each other. Properties require Encapsulation. Getters require Properties. The whole structure is a dependency tree with a few disagreements sprinkled in.

Falsifiable Claim

REQUIRES is the dominant relation type, making up more than 30% of all edges. Run the query above on your own graph to verify.


Wait, Where's the GROUP BY?

If you've written SQL, you just noticed something weird. There's no GROUP BY clause.

Cypher doesn't need one. Any non-aggregated column in your RETURN automatically becomes a grouping key. When you wrote RETURN type(r), count(r), Cypher grouped by type(r) for you.

This is called implicit grouping. It's one of the nicer things about Cypher compared to SQL, where forgetting GROUP BY gives you a syntax error and a headache.


Counting Concepts by Domain

Edges aren't the only thing worth counting. What about the concepts themselves?

MATCH (c:Concept)
RETURN c.domain, count(c) AS concept_count
ORDER BY concept_count DESC

Result:

domain concept_count
implementation_hiding 18
access_control 12
state_management 9
design_patterns 5
error_handling 3

Domain counts query results

Most of your concepts live in implementation_hiding and access_control. Makes sense: this is the Encapsulation chapter. But 3 concepts in error_handling? That's worth investigating later.


ORDER BY: Sorting Your Results

You've already been using ORDER BY count DESC in the queries above. A few things to know:

  • DESC sorts high to low. ASC (or just leaving it off) sorts low to high.
  • You can sort by any column in your RETURN.
  • You can chain sorts: ORDER BY count DESC, relation_type ASC.

Here's the edge count again, but ascending (least common first):

MATCH ()-[r]->()
RETURN type(r) AS relation_type, count(r) AS count
ORDER BY count ASC

CONTRADICTS floats to the top at 4. The rarest relationship. Only four pairs of concepts in this chapter genuinely conflict with each other.


The Word You Just Earned

Everything you've done in this chapter (counting things, grouping them, sorting the results) has a name: aggregation.

Count, sum, average, collect into a list: these are all aggregation functions. Cypher's main ones:

Function What It Does
count(x) How many
sum(x) Add them up
avg(x) Average
min(x) Smallest value
max(x) Largest value
collect(x) Gather into a list

You'll use count() constantly. The others show up when you start working with numeric properties like confidence scores. For now, count() and collect() are your workhorses.


Exercise: Who's Connected to What?

Combine what you learned in Chapter 3.1 (MATCH patterns) with what you learned here (counting) to answer this question:

How many outgoing edges does each concept have? Show only concepts with 3 or more, sorted highest first.

Solution
MATCH (c:Concept)-[r]->()
RETURN c.name, count(r) AS outgoing
ORDER BY outgoing DESC

Then eyeball the results for anything with 3+ outgoing edges. If you want to filter strictly:

MATCH (c:Concept)-[r]->()
WITH c.name AS name, count(r) AS outgoing
WHERE outgoing >= 3
RETURN name, outgoing
ORDER BY outgoing DESC

Notice the WITH clause? It lets you filter on the aggregated value. You can't put WHERE count(r) >= 3 directly because WHERE runs before aggregation. WITH passes intermediate results forward so you can filter after counting.

Keep WITH in your back pocket. You'll need it.


What You Learned

  • count() and implicit grouping: Return a non-aggregated column alongside count() and Cypher groups automatically
  • ORDER BY: Sort results ascending or descending
  • Aggregation: The general name for counting, summing, and collecting, which you now get to use because you've earned it
  • Your graph's shape: Mostly REQUIRES edges (42%), very few CONTRADICTS (5%). Dependencies dominate conflicts.

Next Up

Numbers are nice, but you want to see the connections. Which concepts does Dangerous Setters actually point to? What points back? Let's follow an edge.

Chapter 3.3: Following the Connections