2026-03-21

Embedding Logical Queries on Knowledge Graphs

Table of Contents

[pdf][arXiv]

1. Problem

A way to find answer to "Conjunctive Queries" on Graphs.

Graph G = (V, E) contains nodes and edges of various types.

Nodes can be of various types, e.g. disease, protein, enzyme, symptom, etc. Edges can also be of various types, ASSOCIATE, TARGET, … etc.

Conjunctive query are type of queries that can be written as conjunction (and) of relation between nodes where some nodes are

  1. anchor nodes: known, constant. specified by the query
  2. bound variable nodes: nodes that are not of primary concern. qualified by existential operator
  3. target variable nodes: the answer to the query.

An example might make it clear. "return all drug nodes that are likely to target proteins that are associated with a given disease node \(d\)":

\begin{align*} q = C_? . \exists P : ASSOC(d,P) \wedge TARGET(P, C_?) \end{align*}

Which can be read as, find all drug nodes \(C_?\) such that there exists proteins \(P\) that are associated with the disease \(d\), \(ASSOC(d,P)\), and (\(\wedge\)) the the drug target the protein \(TARGET(P, C_?)\).

These are kind of queries asked on relational databases. The problem here now is that we don't have all the edges in our database. So, clasically, we would have to first predict edges for all possible node pairs and then search for the valid solutions.

2. Approach

The approach in the paper is to first express the query \(q\) in an embedding space \(\mathbb R^d\) which also represents the nodes of the graph. Then the problem is reduced to finding nodes that are nearest, in embedding space, to the embedding of the query.

Embedding for a query is created using two differentiable operators projection \(P\) and intersection \(I\) that are jointly optimized along with the set of node embedding.

2.1. Projection

Projection operator \(P\) takes a query embedding \(q\) (representing a set of nodes) and a edge type \(\tau\), and gives the set of all nodes (represented as a single embedding) which is connected to the nodes in the query \(q\) with edge type \(\tau\). \(P\) is implemented by a matrix multiplication:

\begin{align*} P(q, \tau) = R_\tau q \end{align*}

where, \(R_\tau \in R^{d \times d}\) is a trainable parameter.

2.2. Intersection

Intersection operator \(I\) takes a set of queries (as embeddings) and produces another query (an embedding) which performs intersection i.e. Intersection returns the embedding of the set of common nodes in the provided queries. \(I\) is implemented by transforming each embedding by a neural network and aggergating the results using a symmetric function (e.g. an elementwise mean or min of set of vector), and finally transforming the aggregate using a matrix:

\begin{align*} I(\{q_1, ..., q_n\}) = W_\gamma \Psi(NN_k(q_1), ..., NN_k(q_n)) \end{align*}

where \(\gamma \in \Gamma\) is a node type and \(NN_k\) represents a \(k\) layerd feed forward network.

Equivalently, \(I\) could be expressed by a permutation invariant neural network.

2.3. Conversion

Now the query can be converted to an embedding in \(O(d^2 E)\) time using repeated application of P and I operators. \(d\) is dimension of embedding and \(E\) is number of edges in query DAG (edges in query DAG connect the achor, bound or target nodes with each other. i.e. its a graphical representation of the query).


Backlinks


You can send your feedback, queries here