What Do I Actually Have?¶

Part 3: Exploring Your Knowledge Graph | Chapter 1 of 6

:octicons-clock-16: ~20 minutes :octicons-checklist-16: Prerequisites: Parts 1-2 (you've ingested at least one chapter and have a running Memgraph instance with data)

The Problem With Your First Instinct¶

You just ingested Chapter 5 of Software Design for Python Programmers. The CLI printed a nice little summary:

Ingested: ch05_5_Hide_class_implementations
  47 concepts, 83 edges, 7 rules

Cool. You remember that chapter spending a LOT of time on setters: dangerous setters, property setters, the whole setter antipattern. So you pop open Memgraph Lab, type this, and hit run:

MATCH (c:Concept) WHERE c.name CONTAINS 'setter' RETURN c

And you get back:

c.name
Setter Method
Dangerous Setters
Property Setters

Three nodes.

That's it?

But you KNOW that chapter had 15 pages on dangerous setters, why they break things, how property decorators fix them, the whole setter antipattern. Where did everything go?

Nowhere. It's all in the graph. You just asked the wrong question.

Let's fix that.

Asking Questions: The Two-Word Skeleton¶

Every question you ask this graph has the same shape. Two words: ask, then show.

Here's the simplest possible question: "show me everything."

MATCH (c:Concept)
RETURN c
LIMIT 5

c.name	c.description	c.domain
Encapsulation	Bundling data and methods that operate on that data within a single unit	software_design
Properties	Controlled access to object attributes via @property decorators	software_design
Dangerous Setters	Setter methods that can put an object into an invalid state	software_design
Immutability	Objects that cannot be modified after construction	software_design
Class Invariant	A condition that must be true for all objects before and after method calls	software_design

Two things to notice.

First: MATCH (c:Concept) is the "ask" part. You're saying "find me all nodes labeled Concept, and let me call each one c." The parentheses are literal: a node in a graph looks like a circle, and (c) is your ASCII circle.

Second: RETURN c is the "show" part. Without it, the database finds the nodes but never shows them to you. Like asking someone a question and then walking away before they answer.

The LIMIT 5 is self-defense. You have 47 concepts. You don't need all of them dumped on your screen right now.

The query skeleton

Every question in this tutorial follows the same two-part shape:

MATCH : describe what you're looking for
RETURN : say what you want back

That's it. Everything else (filtering, sorting, counting) hangs off this skeleton.

Picking What To Show¶

When you wrote RETURN c, you got the whole node back: name, description, domain, source_id, every property the thing has. That's noisy. Most of the time, you want specific pieces.

Each node stores properties, and you grab them with a dot:

MATCH (c:Concept)
RETURN c.name, c.description
LIMIT 5

c.name	c.description
Encapsulation	Bundling data and methods that operate on that data within a single unit
Properties	Controlled access to object attributes via @property decorators
Dangerous Setters	Setter methods that can put an object into an invalid state
Immutability	Objects that cannot be modified after construction
Class Invariant	A condition that must be true for all objects before and after method calls

Cleaner. You're reading a table instead of decoding JSON blobs.

The properties available depend on what your extraction pipeline stored. For the concepts in this graph, you'll typically see:

Property	What it holds	Example
`c.name`	The concept's name	"Dangerous Setters"
`c.description`	A short explanation	"Setter methods that can put an object into an invalid state"
`c.domain`	Which knowledge domain it belongs to	"software_design"
`c.source_id`	Where it was extracted from	"ch5"

Now you know how to pick specific fields. Let's go back to the setter problem.

Why Your First Search Was Too Narrow¶

Here's what you ran:

MATCH (c:Concept) WHERE c.name CONTAINS 'setter' RETURN c

Three results. But the chapter covered WAY more than three setter-related ideas. So what went wrong?

Two things. And you're going to discover them one at a time.

Problem 1: You Only Searched The Name¶

Your query said: "find concepts where the name contains the word 'setter'." That's like searching your email by subject line only. A huge email thread titled "Q3 Planning" that mentions setters twenty times in the body? Invisible to your query.

Try this instead: search the description too.

MATCH (c:Concept)
WHERE c.name CONTAINS 'setter'
   OR c.description CONTAINS 'setter'
RETURN c.name, c.description

c.name	c.description
Setter Method	A method that modifies an object's attribute, sometimes called a mutator
Dangerous Setters	Setter methods that can put an object into an invalid state
Property Setters	Using @property decorator to create controlled setter methods
Invalid State	An object state that violates one or more class invariants, often caused by uncontrolled setter access
Encapsulation	Bundling data and methods that operate on that data within a single unit, preventing direct setter access to internal state
Access Control	Mechanisms that restrict how attributes are modified, replacing raw setters with controlled interfaces
State Validation	Checking that object state remains valid after modification, typically in setter or property methods

Seven results. More than double. And look at those descriptions: "Invalid State" mentions setters. "Encapsulation" mentions setters. "Access Control" mentions setters. They were there the whole time. You just weren't looking in the right place.

Search broadly, not narrowly

The concept name is a label. The description is where the real content lives. When you're exploring, search both.

Recap So Far¶

You can ask the graph questions with MATCH and RETURN. You can pick specific properties with the dot (c.name, c.description). And searching just the name misses concepts that mention your term in the description.

But you're not done. You went from 3 results to 7. The chapter had MUCH more to say about setters. There's a second problem.

Problem 2: Case Sensitivity Is Biting You¶

Look at this query again:

WHERE c.name CONTAINS 'setter'

That's looking for lowercase "setter" exactly. What if some concepts store the name as "Setter" with a capital S? Or "SETTER" in all caps? String matching is case-sensitive by default.

The fix: force everything to lowercase before comparing.

MATCH (c:Concept)
WHERE toLower(c.name) CONTAINS 'setter'
   OR toLower(c.description) CONTAINS 'setter'
RETURN c.name, c.description

toLower() converts "Setter Method" to "setter method" before the comparison happens, so now "Setter", "setter", and "SETTER" all match.

Here's the full result set:

c.name	c.description
Setter Method	A method that modifies an object's attribute, sometimes called a mutator
Dangerous Setters	Setter methods that can put an object into an invalid state
Property Setters	Using @property decorator to create controlled setter methods
Invalid State	An object state that violates one or more class invariants, often caused by uncontrolled Setter access
Encapsulation	Bundling data and methods that operate on that data within a single unit, preventing direct Setter access to internal state
Access Control	Mechanisms that restrict how attributes are modified, replacing raw Setters with controlled interfaces
State Validation	Checking that object state remains valid after modification, typically in Setter or property methods
Mutator Pattern	A design pattern involving Setter methods that change object state in a controlled manner
Data Hiding	Restricting direct access to object internals, forcing use of Setter/getter interfaces
Attribute Protection	Mechanisms including naming conventions and property Setters to control attribute access
Write Access	The ability to modify an attribute, typically gated through Setter methods or properties
Object Protocol	The set of methods (including Setters) that define how external code interacts with an object

Twelve results.

Case-insensitive search results

You went from 3 to 12. Your first query was missing 75% of the setter-related concepts. Three-quarters of the relevant data, invisible because you searched one field with case-sensitive matching.

The falsifiable claim

Searching the name field alone misses more than 50% of relevant concepts. You just proved it: 3 out of 12 is 25% coverage. The claim holds.

Stepping Back: How Many Concepts Do You Actually Have?¶

You've been searching for specific things. Let's zoom out. How big is this graph?

MATCH (c:Concept)
RETURN count(c)

47 concepts. The CLI told you that, but now you've confirmed it yourself. Good habit: never trust the summary. Verify.

And here's a question that might not have occurred to you yet: how many of those 47 concepts actually connect to anything?

A concept sitting in the graph with no connections is like a word in a dictionary that nobody uses. It exists, technically. But it's not doing any work.

To check, you need to look for concepts that have at least one line connecting them to something else. Those lines (the arrows between nodes) are called relationships or edges. Same thing, two names. Here's how you find concepts that have them:

MATCH (c:Concept)-[r]-()
RETURN count(DISTINCT c)

Let's unpack that syntax. (c:Concept)-[r]-() says: "find a Concept node c that has some relationship r connecting it to any other node." The square brackets hold the relationship, just like parentheses hold the node. The empty () at the end means "I don't care what's on the other side."

DISTINCT matters here because one concept might have multiple relationships, so it would show up multiple times without it.

32 out of 47. That means 15 concepts have zero connections. They're orphans.

pie-title-concept-connectivity

Why orphans exist

Orphan concepts aren't necessarily errors. Sometimes the extraction pipeline captures a concept that's mentioned in passing but never related to other concepts in that specific chapter. Sometimes the LLM missed a connection. Either way, you now know they're there, and that's information you can act on.

Not Drowning in Output: ORDER BY and LIMIT¶

You've got 47 concepts. When you start listing them all, you need two tools to keep the output manageable.

LIMIT you've already seen: it caps the number of results.

ORDER BY lets you sort, so the most interesting stuff floats to the top.

MATCH (c:Concept)
RETURN c.name, c.description
ORDER BY c.name
LIMIT 10

Alphabetical. Useful for scanning. But you can sort by anything, including computed values:

MATCH (c:Concept)
RETURN c.name, size(c.description) AS desc_length
ORDER BY desc_length DESC
LIMIT 5

c.name	desc_length
Programming by Contract	142
Encapsulation	118
Dangerous Setters	105
Class Invariant	98
Properties	87

DESC means descending (biggest first). Without it, you get ascending (smallest first). size() on a string gives you its length.

The concepts with the longest descriptions tend to be the most important ones. That's not a rule, just a pattern you'll notice: the extraction pipeline writes more when there's more to say.

Recap¶

Here's where you stand. You can:

Ask questions with MATCH and get answers with RETURN
Pick specific properties: c.name, c.description, c.domain
Search text with CONTAINS (and make it case-insensitive with toLower())
Search multiple fields with OR
Count things with count()
Sort with ORDER BY and cap output with LIMIT
Find connected concepts using relationship patterns: (c)-[r]-()

And you know something important about your data: 47 concepts, but only 32 have connections. 15 orphans sitting in the dark.

Exercises¶

Time to get your hands dirty. Each exercise has a hint if you're stuck and a solution you can expand.

Search both the name and description fields (case-insensitive) for any concept that mentions "validation." Return the concept name and description.

Hint

The pattern is the same one you used for "setter": toLower() on both fields, CONTAINS with OR.

Solution

MATCH (c:Concept)
WHERE toLower(c.name) CONTAINS 'validation'
   OR toLower(c.description) CONTAINS 'validation'
RETURN c.name, c.description

Expected output: You should see concepts like "State Validation", "Input Validation", "Precondition" (whose description mentions validation), and possibly others. The exact count depends on your extraction, but expect 4-6 results.

Exercise 2: Find Concepts With Long Descriptions¶

Some concepts got a one-liner. Others got a paragraph. Find all concepts whose description is longer than 100 characters. Return the name and the length of the description, sorted longest first.

Hint

Use size(c.description) to get the string length, and filter with WHERE.

Solution

MATCH (c:Concept)
WHERE size(c.description) > 100
RETURN c.name, size(c.description) AS desc_length
ORDER BY desc_length DESC

Expected output: Somewhere between 5-10 concepts, depending on how verbose your extraction pipeline was. The top entries will likely be foundational concepts like "Encapsulation" or "Programming by Contract" that needed more words to describe.

Exercise 3: List All Unique Domain Values¶

Your concepts have a domain property. But how many different domains are there? List every unique domain value and how many concepts belong to each.

Hint

RETURN c.domain gives you the domain. If you return a non-aggregated field alongside count(), Cypher groups by that field automatically.

Solution

MATCH (c:Concept)
RETURN c.domain, count(c) AS concept_count
ORDER BY concept_count DESC

Expected output: If you ingested a single chapter, you'll likely see one domain ("software_design") with all 47 concepts. If you ingested multiple chapters from different domains, you'll see the breakdown. Either way, now you know how your data is organized.

What You Learned¶

The query skeleton: MATCH asks, RETURN shows. Everything else hangs off those two.
Property access: Grab specific fields with c.name, c.description, c.domain.
Text search: CONTAINS finds substrings. toLower() makes it case-insensitive. Always search multiple fields.
Counting: count() tells you how many. DISTINCT avoids double-counting.
Relationships (edges): The lines between nodes, written as (a)-[r]-(b). You used them to find which concepts have connections.
Output control: ORDER BY sorts. LIMIT caps. Use both or drown.
The big number: 47 concepts, 32 connected, 15 orphans.

Next Up¶

47 concepts. 32 with connections. 15 orphans.

But you still don't know what KINDS of connections exist. Is the graph mostly "this requires that"? Or "this contradicts that"? Is there a dominant relationship type, or is it evenly spread?

Next chapter: Counting and Grouping. We count the edges and find out what this graph is really made of.

What Do I Actually Have?¶

The Problem With Your First Instinct¶

Asking Questions: The Two-Word Skeleton¶

Picking What To Show¶

Why Your First Search Was Too Narrow¶

Problem 1: You Only Searched The Name¶

Recap So Far¶

Problem 2: Case Sensitivity Is Biting You¶

Stepping Back: How Many Concepts Do You Actually Have?¶

Not Drowning in Output: ORDER BY and LIMIT¶

Recap¶

Exercises¶

Exercise 1: Find All Concepts Related to "Validation"¶

Exercise 2: Find Concepts With Long Descriptions¶

Exercise 3: List All Unique Domain Values¶

What You Learned¶

Next Up¶