Complete Guide to Graph Data Structure

Introduction

A Graph is a non-linear data structure consisting of vertices (nodes) and edges that connect these vertices. Graphs are one of the most versatile and powerful data structures in computer science, used to model relationships, networks, and connections between entities.

Unlike trees, graphs can have cycles, multiple paths between nodes, and don't have a hierarchical structure. They're essential for solving problems involving networks, paths, connectivity, and optimization.

What is a Graph?

A graph G is defined as G = (V, E), where:

  • V: Set of vertices (nodes)
  • E: Set of edges (connections between vertices)

Visual Example

    A --- B
    |     |
    |     |
    C --- D --- E
          |
          F

This graph has:

  • Vertices: A, B, C, D, E, F
  • Edges: (A,B), (A,C), (B,D), (C,D), (D,E), (D,F)

Types of Graphs

1. Directed Graph (Digraph)

Edges have a direction (represented by arrows). Edge (A, B) means you can go from A to B, but not necessarily from B to A.

A → B → C
↑       ↓
└───────┘

Use Cases: Web pages with hyperlinks, task scheduling, social media followers

2. Undirected Graph

Edges have no direction. If there's an edge between A and B, you can traverse both ways.

A — B — C
|       |
└───────┘

Use Cases: Social networks (friendships), road networks, computer networks

3. Weighted Graph

Edges have weights (costs) associated with them.

    5       3
A ─────> B ─────> C
│               ↗
│ 2         1  │
└──────────────┘

Use Cases: Maps with distances, network latency, flight costs

4. Unweighted Graph

All edges are equal (weight = 1 or no weight).

Use Cases: Simple connections, relationships without intensity

5. Cyclic Graph

Contains at least one cycle (path that starts and ends at the same vertex).

A → B
↑   ↓
└── C

6. Acyclic Graph

Contains no cycles. DAG (Directed Acyclic Graph) is particularly important.

A → B → D
↓   ↓
C → E

Use Cases: Task dependencies, course prerequisites, build systems

7. Connected vs Disconnected Graph

  • Connected: Path exists between every pair of vertices
  • Disconnected: Some vertices are not reachable from others

8. Complete Graph

Every vertex is connected to every other vertex. A complete graph with n vertices has n(n-1)/2 edges.

Graph Representations

1. Adjacency Matrix

A 2D array where matrix[i][j] = 1 (or weight) if edge exists from vertex i to j.

Example:

     A  B  C  D
A    0  1  1  0
B    1  0  0  1
C    1  0  0  1
D    0  1  1  0

Advantages:

  • O(1) edge lookup
  • Simple implementation
  • Good for dense graphs

Disadvantages:

  • O(V²) space complexity
  • Iterating over neighbors is O(V)
  • Inefficient for sparse graphs

2. Adjacency List

An array/map where each vertex stores a list of its neighbors.

Example:

graph = {
    'A': ['B', 'C'],
    'B': ['A', 'D'],
    'C': ['A', 'D'],
    'D': ['B', 'C']
}

Advantages:

  • O(V + E) space complexity
  • Efficient for sparse graphs
  • Easy to iterate over neighbors

Disadvantages:

  • O(degree) time to check if edge exists
  • Slightly more complex implementation

3. Edge List

A list of all edges in the graph.

Example:

edges = [
    ('A', 'B'),
    ('A', 'C'),
    ('B', 'D'),
    ('C', 'D')
]

Use Cases: Kruskal's algorithm, simple graph storage

Graph Traversal Algorithms

1. Breadth-First Search (BFS)

Explores vertices level by level, using a queue.

Algorithm:

from collections import deque

def bfs(graph, start):
    visited = set()
    queue = deque([start])
    visited.add(start)

    while queue:
        vertex = queue.popleft()
        print(vertex)  # Process vertex

        for neighbor in graph[vertex]:
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append(neighbor)

Use Cases:

  • Shortest path in unweighted graphs
  • Level-order traversal
  • Finding connected components
  • Web crawlers

Time Complexity: O(V + E)
Space Complexity: O(V)

2. Depth-First Search (DFS)

Explores as far as possible along each branch before backtracking, using a stack (or recursion).

Recursive Implementation:

def dfs(graph, vertex, visited=None):
    if visited is None:
        visited = set()

    visited.add(vertex)
    print(vertex)  # Process vertex

    for neighbor in graph[vertex]:
        if neighbor not in visited:
            dfs(graph, neighbor, visited)

    return visited

Iterative Implementation:

def dfs_iterative(graph, start):
    visited = set()
    stack = [start]

    while stack:
        vertex = stack.pop()
        if vertex not in visited:
            visited.add(vertex)
            print(vertex)  # Process vertex

            # Add neighbors in reverse order for consistency
            for neighbor in reversed(graph[vertex]):
                if neighbor not in visited:
                    stack.append(neighbor)

Use Cases:

  • Cycle detection
  • Topological sorting
  • Finding strongly connected components
  • Maze solving
  • Pathfinding

Time Complexity: O(V + E)
Space Complexity: O(V)

Common Graph Algorithms

1. Dijkstra's Algorithm

Finds shortest path from source to all vertices in weighted graph (no negative weights).

Algorithm:

import heapq

def dijkstra(graph, start):
    distances = {vertex: float('infinity') for vertex in graph}
    distances[start] = 0
    pq = [(0, start)]  # (distance, vertex)

    while pq:
        current_distance, current_vertex = heapq.heappop(pq)

        if current_distance > distances[current_vertex]:
            continue

        for neighbor, weight in graph[current_vertex]:
            distance = current_distance + weight

            if distance < distances[neighbor]:
                distances[neighbor] = distance
                heapq.heappush(pq, (distance, neighbor))

    return distances

Time Complexity: O((V + E) log V) with min-heap
Use Cases: GPS navigation, network routing, flight paths

2. Bellman-Ford Algorithm

Finds shortest path, handles negative weights, detects negative cycles.

Time Complexity: O(V × E)
Use Cases: Currency arbitrage, network routing with costs

3. Floyd-Warshall Algorithm

Finds shortest paths between all pairs of vertices.

Time Complexity: O(V³)
Use Cases: Finding transitive closure, shortest paths in dense graphs

4. Topological Sort

Linear ordering of vertices in a DAG such that for every edge (u, v), u comes before v.

Algorithm (DFS-based):

def topological_sort(graph):
    visited = set()
    stack = []

    def dfs(vertex):
        visited.add(vertex)
        for neighbor in graph[vertex]:
            if neighbor not in visited:
                dfs(neighbor)
        stack.append(vertex)

    for vertex in graph:
        if vertex not in visited:
            dfs(vertex)

    return stack[::-1]  # Reverse the stack

Time Complexity: O(V + E)
Use Cases: Task scheduling, course prerequisites, build systems

5. Kruskal's Algorithm

Finds Minimum Spanning Tree (MST) using Union-Find.

Time Complexity: O(E log E)
Use Cases: Network design, clustering

6. Prim's Algorithm

Finds MST by growing tree from a starting vertex.

Time Complexity: O(E log V)
Use Cases: Network design, approximation algorithms

7. Union-Find (Disjoint Set Union)

Tracks connected components and efficiently merges them.

Algorithm:

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # Path compression
        return self.parent[x]

    def union(self, x, y):
        px, py = self.find(x), self.find(y)
        if px == py:
            return False

        # Union by rank
        if self.rank[px] < self.rank[py]:
            px, py = py, px
        self.parent[py] = px
        if self.rank[px] == self.rank[py]:
            self.rank[px] += 1
        return True

Time Complexity: Nearly O(1) with path compression and union by rank
Use Cases: Detecting cycles, connected components, Kruskal's algorithm

Implementation

Complete Graph Class

from collections import defaultdict, deque
import heapq

class Graph:
def **init**(self, directed=False):
self.graph = defaultdict(list)
self.directed = directed

    def add_edge(self, u, v, weight=1):
        """Add an edge from u to v with optional weight."""
        self.graph[u].append((v, weight))
        if not self.directed:
            self.graph[v].append((u, weight))

    def bfs(self, start):
        """Breadth-First Search traversal."""
        visited = set()
        queue = deque([start])
        visited.add(start)
        result = []

        while queue:
            vertex = queue.popleft()
            result.append(vertex)

            for neighbor, _ in self.graph[vertex]:
                if neighbor not in visited:
                    visited.add(neighbor)
                    queue.append(neighbor)

        return result

    def dfs(self, start, visited=None):
        """Depth-First Search traversal (recursive)."""
        if visited is None:
            visited = set()

        visited.add(start)
        result = [start]

        for neighbor, _ in self.graph[start]:
            if neighbor not in visited:
                result.extend(self.dfs(neighbor, visited))

        return result

    def has_cycle_undirected(self):
        """Detect cycle in undirected graph."""
        visited = set()

        def dfs(vertex, parent):
            visited.add(vertex)
            for neighbor, _ in self.graph[vertex]:
                if neighbor not in visited:
                    if dfs(neighbor, vertex):
                        return True
                elif neighbor != parent:
                    return True
            return False

        for vertex in self.graph:
            if vertex not in visited:
                if dfs(vertex, None):
                    return True
        return False

    def has_cycle_directed(self):
        """Detect cycle in directed graph."""
        WHITE, GRAY, BLACK = 0, 1, 2
        color = {vertex: WHITE for vertex in self.graph}

        def dfs(vertex):
            color[vertex] = GRAY
            for neighbor, _ in self.graph[vertex]:
                if color.get(neighbor, WHITE) == GRAY:
                    return True
                if color.get(neighbor, WHITE) == WHITE:
                    if dfs(neighbor):
                        return True
            color[vertex] = BLACK
            return False

        for vertex in self.graph:
            if color[vertex] == WHITE:
                if dfs(vertex):
                    return True
        return False

    def shortest_path_bfs(self, start, end):
        """Find shortest path in unweighted graph using BFS."""
        if start == end:
            return [start]

        visited = {start}
        queue = deque([(start, [start])])

        while queue:
            vertex, path = queue.popleft()

            for neighbor, _ in self.graph[vertex]:
                if neighbor not in visited:
                    new_path = path + [neighbor]
                    if neighbor == end:
                        return new_path
                    visited.add(neighbor)
                    queue.append((neighbor, new_path))

        return None  # No path found

    def dijkstra(self, start):
        """Find shortest paths from start to all vertices."""
        distances = {vertex: float('infinity') for vertex in self.graph}
        distances[start] = 0
        pq = [(0, start)]

        while pq:
            current_dist, current = heapq.heappop(pq)

            if current_dist > distances[current]:
                continue

            for neighbor, weight in self.graph[current]:
                distance = current_dist + weight

                if distance < distances[neighbor]:
                    distances[neighbor] = distance
                    heapq.heappush(pq, (distance, neighbor))

        return distances

    def topological_sort(self):
        """Topological sort for DAG."""
        if not self.directed:
            raise ValueError("Topological sort only works on directed graphs")

        visited = set()
        stack = []

        def dfs(vertex):
            visited.add(vertex)
            for neighbor, _ in self.graph[vertex]:
                if neighbor not in visited:
                    dfs(neighbor)
            stack.append(vertex)

        for vertex in self.graph:
            if vertex not in visited:
                dfs(vertex)

        return stack[::-1]

# Usage Example

g = Graph(directed=False)
g.add_edge('A', 'B')
g.add_edge('A', 'C')
g.add_edge('B', 'D')
g.add_edge('C', 'D')
g.add_edge('D', 'E')

print("BFS:", g.bfs('A'))
print("DFS:", g.dfs('A'))
print("Has Cycle:", g.has_cycle_undirected())
print("Shortest Path A to E:", g.shortest_path_bfs('A', 'E'))

Time and Space Complexity

Common Operations

OperationAdjacency ListAdjacency Matrix
Add VertexO(1)O(V²)
Add EdgeO(1)O(1)
Remove VertexO(V + E)O(V²)
Remove EdgeO(E)O(1)
Check if Edge ExistsO(V)O(1)
Find All NeighborsO(degree)O(V)
Space ComplexityO(V + E)O(V²)

Algorithm Complexities

AlgorithmTime ComplexitySpace Complexity
BFSO(V + E)O(V)
DFSO(V + E)O(V)
Dijkstra (min-heap)O((V + E) log V)O(V)
Bellman-FordO(V × E)O(V)
Floyd-WarshallO(V³)O(V²)
Topological SortO(V + E)O(V)
Kruskal's MSTO(E log E)O(V)
Prim's MSTO(E log V)O(V)
Union-FindO(α(n)) ≈ O(1)O(V)

Applications

00

Social Networks

Representing relationships between users where vertices are people and edges represent friendships, followers, or connections. Used for friend recommendations, community detection, and influence analysis.

01

Maps and Navigation

GPS systems use graphs where intersections are vertices and roads are weighted edges. Algorithms like Dijkstra's find the shortest route between locations, considering distance, traffic, or time.

02

Web Crawling and PageRank

The internet is a massive directed graph where web pages are vertices and hyperlinks are edges. Search engines use graph algorithms to crawl pages and rank them based on link structure.

03

Recommendation Systems

E-commerce and streaming platforms use graphs to model user-item interactions. Graph-based collaborative filtering recommends products or content based on similar users' preferences.

04

Network Routing

Computer networks use graphs to route data packets efficiently. Routers use shortest path algorithms to determine the optimal path for data transmission across the network.

05

Task Scheduling

Project management and build systems use DAGs to represent task dependencies. Topological sorting determines the order in which tasks should be executed to respect all dependencies.

06

Bioinformatics

Protein interaction networks, gene regulatory networks, and phylogenetic trees are modeled as graphs. Used for drug discovery, understanding biological pathways, and evolutionary analysis.

07

Compiler Design

Control flow graphs and data dependency graphs are used in compiler optimization. They help identify dead code, optimize register allocation, and improve code performance.

Must-Do LeetCode Problems

Beginner Level - Graph Fundamentals

#ProblemDifficultyImportanceKey ConceptsLink
1Number of IslandsMedium⭐⭐⭐⭐⭐DFS/BFS, Grid as graph, Connected components#200
2Clone GraphMedium⭐⭐⭐⭐Graph traversal, Deep copy, HashMap#133
3Find if Path Exists in GraphEasy⭐⭐⭐BFS/DFS basics, Graph representation#1971
4All Paths From Source to TargetMedium⭐⭐⭐DFS, Backtracking, Path finding#797

Why Problem #1 is Critical: This is the most fundamental graph problem. It teaches you to treat a 2D grid as a graph and apply DFS/BFS. Essential foundation for all grid-based graph problems.

Intermediate Level - BFS/DFS Applications

#ProblemDifficultyImportanceKey ConceptsLink
5Word LadderHard⭐⭐⭐⭐⭐BFS, Shortest path, String transformation#127
6Course ScheduleMedium⭐⭐⭐⭐⭐Topological sort, Cycle detection, DFS#207
7Course Schedule IIMedium⭐⭐⭐⭐Topological sort implementation, DAG#210
8Pacific Atlantic Water FlowMedium⭐⭐⭐⭐Multi-source BFS/DFS, Matrix traversal#417
9Surrounded RegionsMedium⭐⭐⭐DFS/BFS from boundary, Grid modification#130
10Rotting OrangesMedium⭐⭐⭐⭐Multi-source BFS, Level-order traversal#994

Why Problems #5, #6 are Critical: Word Ladder is a classic BFS shortest path problem. Course Schedule teaches topological sort and cycle detection, which are fundamental for dependency resolution problems.

Advanced Level - Shortest Path & MST

#ProblemDifficultyImportanceKey ConceptsLink
11Network Delay TimeMedium⭐⭐⭐⭐⭐Dijkstra's algorithm, Weighted graph#743
12Cheapest Flights Within K StopsMedium⭐⭐⭐⭐Modified Dijkstra/BFS, Constrained shortest path#787
13Path With Minimum EffortMedium⭐⭐⭐⭐Dijkstra/Binary search, Priority queue#1631
14Min Cost to Connect All PointsMedium⭐⭐⭐⭐Prim's/Kruskal's MST, Union-Find#1584
15Swim in Rising WaterHard⭐⭐⭐⭐Dijkstra, Binary search, Union-Find#778

Why Problem #11 is Critical: This is the standard Dijkstra's algorithm problem. Mastering this opens up all weighted shortest path problems.

Expert Level - Union-Find & Advanced

#ProblemDifficultyImportanceKey ConceptsLink
16Number of Connected Components in Undirected GraphMedium⭐⭐⭐⭐⭐Union-Find, DFS, Connected components#323
17Graph Valid TreeMedium⭐⭐⭐⭐Union-Find, Cycle detection, Tree properties#261
18Accounts MergeMedium⭐⭐⭐⭐Union-Find, DFS, Connected components#721
19Redundant ConnectionMedium⭐⭐⭐⭐Union-Find, Cycle detection#684
20Critical Connections in a NetworkHard⭐⭐⭐⭐⭐Tarjan's algorithm, Bridges, DFS#1192

Why Problems #16, #20 are Critical: #16 is the foundation for Union-Find mastery. #20 teaches Tarjan's algorithm for finding bridges and articulation points, crucial for network analysis.

Master Level - Complex Graph Problems

#ProblemDifficultyImportanceKey ConceptsLink
21Word Ladder IIHard⭐⭐⭐⭐BFS + Backtracking, Path reconstruction#126
22Alien DictionaryHard⭐⭐⭐⭐⭐Topological sort, Graph construction#269
23Reconstruct ItineraryHard⭐⭐⭐⭐Eulerian path, DFS, Hierholzer's algorithm#332
24Minimum Height TreesMedium⭐⭐⭐⭐Tree centers, Topological sort, BFS#310
25Shortest Path Visiting All NodesHard⭐⭐⭐⭐BFS with state, Bitmask, Traveling salesman#847
26Bus RoutesHard⭐⭐⭐⭐BFS on hypergraph, Graph modeling#815
27Parallel Courses IIIHard⭐⭐⭐⭐Topological sort, Dynamic programming#2050

Why Problem #22 is Critical: Alien Dictionary is a classic interview problem that combines graph construction with topological sort. It tests your ability to model real-world problems as graphs.

Bonus - Bipartite & Coloring

#ProblemDifficultyImportanceKey ConceptsLink
28Is Graph Bipartite?Medium⭐⭐⭐⭐