Oct 2, 2017

Graph-Based Live Queries in AOS

In our product – AOS®– we create and manage a model that represents a single source of truth regarding infrastructure, policies, constraints etc. This model is subject to constant change and we need to query it for various reasons, and we represent it as a graph. With the graph, all information about our network is modeled as nodes and relationships between them.

Every object in a graph has a unique ID. Nodes have a type (which is a string) and set of additional properties based on a particular type. For example, all switches in our system are represented by nodes of type “system” and can have a property “role” which determines which role in network it is assigned (spine/leaf/server). Physical and logical switch ports are represented by an “interface” node, which also has a property called “if_type”.

Relationships between different nodes are represented as graph edges which we call “relationships”. Relationships are directed, meaning each relationship has a source node and a target node. Relationships also have a type which determines which additional properties particular relationship can have. E.g. “system” nodes have relationships of type “hosted_interfaces” towards “interface” nodes.

A set of possible node and relationship types is determined by a graph schema. The schema defines which properties nodes and relationships of particular type can have along with types of those properties (string/integer/boolean/etc) and constraints. We use and maintain an open source schema library, Lollipop that allows flexible customization of value types.

Going back to the graph representing a single source of truth, one of the most challenging aspects was how to reason about it in the presence of change, coming from both the operator and the managed system. In order to support this we developed what we call “Live Query” mechanism which has three essential components:

  1. Query Specification
  2. Change Notification
  3. Notification Processing

Query Specification

Having modelled our domain model as a graph, it is important to be able to find particular patterns (subgraphs) in a graph.


You do that by running searches on a graph specified by graph queries. The language to express the query is conceptually based on Gremlin, which is an open source graph traversal language. We also have parsers for queries expressed in another language – Cypher, which is a query language used by popular graph database neo4j.

You start with a node() and then keep chaining method calls, alternating between matching relationships and nodes:


The query above translated in English reads something like: “starting from a node of type system, traverse any outgoing relationship that reaches node of type interface, and from that node traverse all outgoing relationship that lead to node of type link”.

At any point you can add extra constraints:

node(‘system‘, role=’spine‘).out().node(‘interface‘, if_type=’ip‘)

Notice role=”spine” argument, it will select only “system” nodes that have “role” property set to “spine”. Same with “if_type” property for “interface” nodes.

node(‘system‘, role=is_in([‘spine‘, ‘leaf‘]))
.node(‘interface‘, if_type=ne(‘ip‘))

That query will select all “system” nodes that have role either “spine” or “leaf” and “interface” nodes that have “if_type” anything but “ip” (“ne” means “not equal”).

You can also add cross-object conditions which can be arbitrary Python functions:

.out().node(‘interface‘, name=’if1‘)
.in_().node(‘interface‘, name=’if2‘)
.where(lambda if1, if2: if1.if_type != if2.if_type)

You refer to objects by giving them names and using those names as argument names for your constraint function (of course you can override that but it makes a convenient default behavior). So, in example above it will take two “interface” nodes named “if1” and “if2”, pass them into given “where” function and filter out those paths, for which function returns False. Don’t worry about where you place your constraint: it will be applied during search as soon as all objects referenced by constraint are available.

Now, you have a single path, you can use it to do searches. However, sometimes you might want to have a query slightly more complex than a single path. To support that, query DSL allows you to define multiple paths in the same query, separated by comma(s):

node(‘a‘).out().node(‘b‘, name=’b‘).out().node(‘c‘),

This match() function creates a grouping of paths. All objects that share same name in different paths will actually be referring to the same object. Also, match() allows adding more constraints on objects with where(). You can do a distinct search on particular objects and it will ensure that each combination of values is seen only once in results:

node(‘a‘, name=’a‘).out().node(‘b‘).out().node(‘c‘, name=’c‘)
).distinct([‘a‘, ‘c‘])

This matches a chain of a -> b -> c nodes. If two nodes “a” and “c” are connected through more than one node of type “b”, the result will still contain only one (“a”, “c”) pair.
There is another convenient pattern to use when writing queries: you separate your structure from your criteria:

node(‘a‘, name=’a‘).out().node(‘b‘).out().node(‘c‘, name=’c‘),
node(‘a‘, foo=’bar‘),
node(‘c‘, bar=123),

Query engine will optimize that query into:

node(‘a‘, name=’a‘, foo=’bar‘)
.out().node(‘c‘, name=’c‘, bar=123)

No cartesian product, no unnecessary steps.

Change Notification

Ok, now you have a graph query defined. What does a notification result look like? Each result will be a dictionary mapping a name that you have defined for a query object to object found. E.g. for following query

node(‘a‘, name=’a‘).out().node(‘b‘).out().node(‘c‘, name=’c‘)

results will look like {‘a‘: <node type=’a‘>, ‘c‘: <node type=’c‘>}. Notice, only named objects are present (there is no <node type=’b‘> in results, although that node is present in query because it does not have a name).

You register a query to be monitored and a callback to execute if something will change. Later, if someone will modify the graph being monitored, it will detect that new graph updates caused new query results to appear, or old results to disappear or update. The response executes the callback that is associated with the query. The callback receives the whole path from the query as a response, and a specific action (added/updated/removed) to execute.

Notification Processing

When the result is passed to the processing (callback) function, from there you can specify reasoning logic. This could really be anything, from generating logs, errors, to rendering configurations, or running semantic validations. You could also modify the graph itself, using graph APIs and some other piece of logic may react to changes you made. This way, you can enforce the graph as a single source of truth while it also serves as a logical communication channel between pieces of your application logic.

The Graph API consists of three parts:

  1. Graph management – methods to add/update/remove stuff in a graph.
    add_node(), set_node(), del_node(), get_node()
    add_relationship(), set_relationship(), del_relationship(), get_relationship()
  2. Query
  3. Observable interface

Graph management APIs are pretty self explanatory.

add_node() creates new node
set_node() updates properties of existing node
del_node() deletes a node
commit() is used to signal that all updates to the graph are complete and they can be propagated to all listeners.

Relationships have similar API.

The observable interface allows you to add/remove observers — objects that implement notification a callback interface. Notification callback consists of three methods: on_node(), on_relationship() and on_graph(). Methods on_node() and on_relationship() are called when any node/relationship is added, removed or updated. on_graph() is called when the graph is committed.

The Query API is the heart of our graph API and is what powers all searching. Both get_nodes() and get_relationships() allow you to search for corresponding objects in a graph. Arguments to those functions are constraints on searched objects. E.g. get_nodes() returns you all nodes in a graph, get_nodes(type=’system‘) returns you all “system” nodes, get_nodes(type=’system‘, role=’spine‘) allows you to constrain returned nodes to those having particular property values. Values for each argument could be either a plain value or a special “property matcher” object. If the value is a plain value, the corresponding result object should have it’s property equal to the given plain value. Property matchers allow you to express a more complex criterias, e.g. “not equal”, “less than”, “one of given values” and so on:

Property matcher example:

role=is_in([‘spine‘, ‘leaf‘]),

In your graph schema you can define custom indexes for particular node/relationship types and the methods get_nodes() and get_relationships() pick the best index for each particular combination of constraints passed to minimize search time.

Results of get_nodes()/get_relationships() are special iterator objects. You can iterate over them and they will yield all found graph objects. You can also use APIs that those iterators provide to navigate those result sets. E.g. get_nodes() returns you a NodeIterator object which has methods out() and in_(). You can use those to get an iterator over all outgoing or incoming relationship from each node in the original result set. Then, you can use those to get nodes on the other end of those relationships and continue from them. You can also pass property constraints to those methods the same way you can do for get_nodes() and get_relationships().

graph.get_nodes(‘system‘, role=’spine‘)
.out(‘interface‘).node(‘interface‘, if_type=’loopback‘)

The code in the example above finds all nodes with type “system” and role “spine” and then finds all their loopback interfaces.

Putting It All Together

node(‘system‘, name=’spine_device‘, role=’spine‘)
 .node(‘interface‘, name=’spine_if‘)
 .node(‘link‘, name=’link’)
 .node(‘interface‘, name=’leaf_if‘)
 .node(‘system‘, name=’leaf_device‘, role=’leaf‘)
def process_spine_leaf_link(self, path, action):
 Process link between spine and leaf

 spine = path[‘spine_device‘]
 leaf = path[‘leaf_device‘]
 if action in [‘added‘, ‘updated‘]:
 # do something with added/updated link
 # do something about removed link