Filtering by Pattern

Part 3: Exploring Your Knowledge Graph | Chapter 4 of 6

Before you start

Prerequisites: Chapter 3.3: Following the Connections Time: ~10 minutes


Recap

In 3.3, you followed arrows. You started at a concept, traversed outgoing and incoming edges, even chased variable-length paths two and three hops deep. You can trace any connection in this graph.

But tracing connections only finds what's connected. What about the stuff that isn't?


The Question

Which concepts are islands?

No incoming edges. No outgoing edges. Just floating there, alone in the graph, connected to nothing. These aren't curiosities: they're coverage gaps. If your extraction pipeline pulled out a concept but found zero relationships for it, either that concept is genuinely standalone or (more likely) the extraction missed something.

Let's find them.


Finding Orphans

MATCH (c:Concept)
WHERE NOT EXISTS { MATCH (c)-[]-() }
RETURN c.name, c.description

That WHERE NOT EXISTS { MATCH ... } is the key move. You're telling Cypher: "give me concepts, but only the ones where this pattern fails to match." The inner MATCH tries to find any edge (any direction, any type) connected to c. If it can't find one, the concept passes the filter.

Results:

c.name c.description
Composition Over Inheritance Favoring object composition over class inheritance
Law of Demeter Principle limiting knowledge between components
Cohesion Degree to which elements of a module belong together
Loose Coupling Minimizing dependencies between components
YAGNI You Aren't Gonna Need It principle
Single Responsibility A class should have one reason to change
Interface Segregation Clients should not depend on interfaces they don't use
Dependency Injection Supplying dependencies from outside rather than creating them

Eight orphan concepts. Out of 47 total, that's 17%.

What this means

17% of your graph has zero connections. These concepts were extracted from the text but no relationships were found linking them to anything else. That's either a real signal (they're mentioned in passing, not deeply discussed) or a gap in your extraction pipeline. Either way, now you know.


Filtering by Specific Relationship Type

Orphans are the extreme case. What about concepts that are connected to something but lack a specific type of edge?

Say you want to find concepts that never require anything:

MATCH (c:Concept)
WHERE NOT EXISTS { MATCH (c)-[:REQUIRES]->() }
RETURN c.name

Same pattern, narrower filter. The inner MATCH only looks for outgoing REQUIRES edges. Concepts that have other edge types but no REQUIRES still show up here.

This is useful for auditing. If a concept represents a technique, it probably requires something. No REQUIRES edges might mean the extraction didn't capture its dependencies.


Counting Connections: The Degree Query

Orphans are degree-zero concepts. But what about the rest of the spectrum? Let's see the full picture:

MATCH (c:Concept)
RETURN c.name, COUNT { MATCH (c)-[]-() } AS degree
ORDER BY degree DESC

COUNT { MATCH ... } is the counting cousin of EXISTS { MATCH ... }. Instead of returning true/false, it returns a number: how many times the inner pattern matched.

Here's what comes back (top 10):

c.name degree
Properties 23
Encapsulation 8
Dangerous Setters 7
Class Invariant 6
Getter Method 5
Setter Method 5
Access Control 5
Validation 4
Data Hiding 4
Invalid State 4

And at the bottom: those 8 orphans, all sitting at degree 0.

Three tools, one family

You've now seen the full pattern-filtering toolkit:

  • WHERE NOT EXISTS { MATCH ... }: Does this pattern fail to match? (boolean negative)
  • EXISTS { MATCH ... }: Does this pattern match at all? (boolean positive)
  • COUNT { MATCH ... }: How many times does this pattern match? (numeric)

Same syntax, different questions.


The Claim, Tested

The falsifiable claim for this chapter: at least 10% of concepts are orphans (no connections).

Eight orphans out of 47 concepts = 17%. Claim holds. Your knowledge graph has real gaps.


Exercise

Find the leaf concepts

Task: Find all concepts with exactly one connection (degree = 1). These are "leaf" concepts, dangling at the edge of the graph with a single link to anything. How many are there?

Hint: You already know how to count connections with COUNT { MATCH ... }. You just need to filter for a specific count.

Solution
MATCH (c:Concept)
WITH c, COUNT { MATCH (c)-[]-() } AS degree
WHERE degree = 1
RETURN c.name, degree

This uses a WITH clause to compute the degree first, then filters on it. You can't put COUNT { ... } directly in a WHERE comparison, so the WITH step gives it a name you can filter against.

You should find several leaf concepts: nodes that connect to the graph through exactly one relationship. Like orphans, these are worth investigating. A concept with a single connection is barely integrated into the graph's structure.


What You Learned

  • WHERE NOT EXISTS { MATCH ... } filters out concepts where a pattern does match. Negation by pattern.
  • EXISTS { MATCH ... } checks whether a pattern matches at all. Boolean subquery.
  • COUNT { MATCH ... } counts how many times a pattern matches. The numeric version.
  • 17% of your graph's concepts are orphans: extracted but unconnected.
  • Degree (number of connections) varies wildly across concepts, from 0 to 23.

Next Up

You've found orphans and counted degrees. That degree column is interesting: Properties has 23 connections, most concepts have fewer than 5. There's a massive skew.

Which concept is the real hub of this graph? You'd probably guess Encapsulation (it's literally the chapter title). You'd be wrong.

Next: 3.5 Finding the Hubs