Graph Analysis

Correlation graphs, MST extraction, community detection, and contagion risk

Markets form networks. Correlated assets cluster together, central assets act as hubs, and shocks propagate along edges. Graph analysis turns a correlation matrix into a structure you can reason about: clusters, hubs, and diversification gaps.

API

Build a correlation graph

python
graph = hz.build_correlation_graph(returns_matrix, threshold=0.5)
# returns_matrix: list of lists (N assets x T observations)
# threshold: minimum |correlation| to create an edge
# Returns: Graph object

print(graph.n_nodes())       # number of assets
print(graph.n_edges())       # number of edges above threshold
print(graph.density())       # edge density (0 to 1)

Minimum spanning tree

The MST extracts the strongest-link backbone of the correlation network. It keeps exactly N-1 edges that connect all nodes with maximum total correlation.

python
mst = graph.minimum_spanning_tree()
# mst: list of (node_a, node_b, correlation) tuples

Community detection

Find clusters of tightly correlated assets.

python
communities = graph.detect_communities()
# communities: list of lists, each inner list is a group of asset indices

for i, group in enumerate(communities):
    print(f"Cluster {i}: {[asset_names[j] for j in group]}")

Node centrality

Identify hub assets that connect many others.

python
centrality = graph.centrality()
# centrality: list of (node_index, centrality_score) sorted descending

hub_asset = asset_names[centrality[0][0]]
print(f"Most central asset: {hub_asset}")

Portfolio diversification check

Use communities to verify that your portfolio isn’t secretly concentrated:

python
returns = [returns_aapl, returns_msft, returns_goog, returns_jpm, returns_gs,
           returns_xom, returns_cvx, returns_btc, returns_eth, returns_sol]
names = ["AAPL", "MSFT", "GOOG", "JPM", "GS", "XOM", "CVX", "BTC", "ETH", "SOL"]

graph = hz.build_correlation_graph(returns, threshold=0.4)
communities = graph.detect_communities()

for i, cluster in enumerate(communities):
    cluster_names = [names[j] for j in cluster]
    print(f"Cluster {i}: {cluster_names}")

# Typical output:
# Cluster 0: ['AAPL', 'MSFT', 'GOOG']     -- tech
# Cluster 1: ['JPM', 'GS']                 -- financials
# Cluster 2: ['XOM', 'CVX']                -- energy
# Cluster 3: ['BTC', 'ETH', 'SOL']         -- crypto

If all your positions fall in one cluster, you’re concentrated regardless of how many tickers you hold.

When to use

  • Diversification audit: verify that your portfolio spans multiple clusters, not just one.
  • Hub identification: find which assets are central to the network and thus carry systemic risk.
  • Regime monitoring: track graph density as a real-time measure of market stress.
  • MST visualization: reduce a dense correlation matrix to a sparse tree for human interpretation.

Next