Intro

I read a terrific blog-post recently by Sander Dieleman titled Diffusion is Spectral Autoregression, which can be found here: https://sander.ai/2024/09/02/spectral-autoregression.html. As a quick summary, Dieleman showed that by applying a spectral analysis to images, one can construct a power law. The general process is as follows

  • For a given image (X):
    • frequencies, magnitudes <- fourier transform(X)
    • shifted frequencies <- shift(frequencies)
    • powers <- magnitudes^2
    • radially averaged spectra <- RAPSD(powers, shifted frequencies)

The standard diffusion process relies on iteratively adding gaussian noise of varying levels to a sample. Gaussian noise consists of a power spectrum which is uniformly distributed across all frequencies, in other words, it contains all frequencies equally. The process of adding gaussian noise to an image, in fourier space, looks like adding a horizontal line to a standard power law. What this means for diffusion is that at each step in the process, noise of increasing power is added to the image’s spectrum. Because power laws live on a log-log plot, the additive noise doesn’t shift the curve upwards, instead it results in a an elbow forming. As the power increases the elbow slides up the power spectrum until eventually all that remains is noise.

The resulting curve looks an awful lot like an autoregressive process. In the case of language modeling, or autoregressive modeling in general, the elbow is sharp. It is a discontinuity. In the case of the additive process outlined above, the elbow is smooth. This connection allows one to draw the conclusion that there is an approximate equivalence between autoregressive and diffusion modeling, where the equivalence would be exact if the elbow were sharp.

To my mind, this brings up two questions

  1. Does this apply to graphs under the analogous graph fourier transform?
  2. Does there always exist some transformation under which data of any modality may be transformed into a power law?

What follows here is a down and dirty attempt at generating a power law from graphs using the graph fourier transform.

Review: Graph Fourier Transform

Unlike the conventional fourier transform, graphs do not carry the notion of harmonics. Instead what we do is evaluate the eigenvalues and eigenvectors of the graph’s Laplacian. We start by defining the adjacency matrix, A, of a graph as being the symmetric square matrix whose element [i,j] are 1 when there is an edge connecting nodes i and j and 0 otherwise. We also must define the degree matrix D, which is a diagonal matrix where each element [i,i] contains the integer number of connections to node i. With these definitions we can define the Laplacian to be

$$L = A – D$$

At first glance it may seem odd to call this operator the laplacian. One connection we can make is to look at the discrete laplacian operator in 2D, which is obtained by taylor expanding a function over a discrete grid. The resulting operator has the exact same form as that of the graph laplacian for the case where the graph in question has a local 2D grid structure.

$$\Delta U_{i,j} = U_{i-1,j} + U_{i+1,j} + U_{i,j-1} + U_{i,j+1} – 4U_{i,j}$$

Having constructed the laplacian, we can now simply find the eigenvalues and eigenvectors of the operator. The eigenvalues are analogous to the frequencies of a standard fourier transform, and the eigenvectors give us a basis to project the graph signal. Concretely, given the signal on the graph’s nodes \(V\), the eigenvalues \(\Lambda\), and eigenvectors \(W\), we can say

$$\Lambda, W = \text{eigen}(L) $$

$$ W^T V = \hat{V} $$

where \(\hat{V}\) are the fourier coefficients, analogous to he amplitudes in a standard fourier transform.

Experiment

In an attempt to construct the power law, we can use a convenient API which gives access to pre-canned graph datasets and benchmarks, the Open Graph Benchmark (OGB). Specifically, for size reasons, we will look at ogbg-molhiv, which is a property prediction dataset containing molecules where each node is an atom in the molecule. To smooth the predictions, we will take an analogous approach to Dieleman. The procedure will be as follows

transform each graph, get a common set of eigenvalues across all graphs, interpolate the fourier coefficients onto the common set of eigenvalues, and average the resulting spectra.

  • for each graph G(V, E) yield \(\Lambda, \hat{V}\)
    • A <- adjacency(G)
    • D <- degree(G)
    • L = A – D
    • \(\Lambda\), W <- \(\text{eigen}(L)\)
    • \(\hat{V} = W^T V\)
  • common_eigenvalues, \( \bar{V} \)<- interpolate fourier coefficients to common eigenvalues
  • power <- \( \text{mean}( \bar{V} )^2\)
  • k <- 1/common_eigenvalues

To follow along specifically with the procedure, you can find the code used here: https://colab.research.google.com/drive/1owqyvbqZzI-D–KPeUTE5_aKShoYFrFy?usp=sharing

Results

Below we see the results of running the above algorithm on 1000 molecular graphs

Clearly there are some edge effects happening at low k. It also seems that there is an upper bound on k for these molecules. Lets chop off the edges and we should get something which looks more reasonable.

This looks like quite a good result. It is important to remember we did some manipulations at small and large k which affect the validity of the power law, but to a first approximation I would say that the answer to question 1 (Does this apply to graphs under the analogous graph fourier transform?) seems to be a resounding yes!

If you happen to have any insights into improving these results, or thoughts on what the true power law exponent is, please comment! I’d love to hear from you.

By Johnny