BFS

algorithm, BFS | 07 July 2022

Introduction

Breadth-first search (BFS) is one of my favorite computer algorithms. It is also one of the simpliest algorithms for searching a graph. Prim’s minimum spanning tree and Dijkstra’s single-source shortest-path use ideas similar to BFS.

A BFS starts at some arbitrary node and explores its neighbors first before moving to the next level of neighbors, in a layer by layer fashion.

BFS is a way to search graph broadly. The algorithm discovers all nodes at distance $$k$& from source node before discovering any nodes at distance $&k+1$&. It is useful for finding the shortest path.

BFS

What the bfs function does is to visit nodes layer by layer, and from left to right. There are 3 states in the traversal. People often like to use color to label their states.

  1. unvisited (white)
  2. discovered and to be visited (grey)
  3. visited (black)

For 2, the grey nodes, BFS uses a queue data structure to track which node to visit next because the traversal is first in and first out (FIFO).

For example, we use BFS on the following graph, which is not drawn with “A” at the top. But we can imagine for ourselves however shape we prefer it to be. undirectedGraph

QWPIA

  • Q = queue
  • W = while
  • P = pop
  • I = if
  • A = append

In code below, I use:

  • an adjacency list (dictionary) to represent grpah
  • a deque to keep track of the queue. Deques are a generalization of stacks and queues. From Python documentation: Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same \(O(1)\) performance in either direction. Though list objects support similar operations, they are optimized for fast fixed-length operations and incur \(O(n)\) memory movement costs for pop(0) and insert(0, v) operations which change both the size and position of the underlying data representation.

The steps are: 0. start with a graph represented with adjacency lists (all nodes are white) 1. place starting node to the queue (grey) 2. while anything is in queue,

  1. pop the first item from the queue
  2. if node popped out has not been visited yet, add it to the visited (it is now visited and is black), and add its neighbors to the queue,

I could have used extend method instead of a for-loop to append each neighbor one by one because extend is much faster. But there is no method that is the opposite of extend to pop multile items simutaneously. So I stay with the for-loop.

codeBFS.py
graph ={
    'A': ['B', 'C'],
    'B': ['D', 'E', 'F'],
    'C': ['G'],
    'D': [],
    'E': [],
    'F': ['H'],
    'G': ['I'],
    'H': [],
    'G': []
}

from collections import deque
def bfs(G, startNode):
    # initialize
    Q = deque(startNode)
    visited = set()
    traversal = []

    while Q: # do until no more node left
        node = Q.popleft()
        if node not in visited:
            visited.add(node)
            traversal.append(node)
            Q.extend(G[node])   
    return traversal

print(bfs(graph, 'A'))
# ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']

BFS works on both directed and undirected graph. The negative side of undirected graph is that we have to do more membership tests. For example, when we are at the \(B\) node, we have to ask if \(A\) was in the visited even though we just came from A.

graph = {
    'A': ['B', 'C'],
    'B': ['A', 'D', 'E', 'F'],
    'C': ['A', 'G'],
    'D': ['B'],
    'E': ['B'],
    'F': ['B', 'H'],
    'G': ['C'],
    'H': ['F'],
    'G': ['C']
}

Complexity

  • Enqueue and dequeue O(V): Each node is enqueued at most once, and hence dequeued at most once, each taking $O(1)$. For all nodes, that is $O(V)$.
  • Scan adjacency list $O(E)$: Because an adjacency list is only scanned when its key node is dequeued, each adjacency list is scanned at most once. Since the total length of adjacency lists is $E$, the total time scanning them is $O(E)$.
  • Total time complexity $O(V + E)$

Compare iterative DFS and BFS

The non-recursive implementation is similar to breadth-first search but differs from it in two ways:

it uses a stack instead of a queue, and it delays checking whether a node has been discovered until the node is popped from the stack rather than making this check before adding the node. If G is a tree, replacing the queue of the breadth-first search algorithm with a stack will yield a depth-first search algorithm. For general graphs, replacing the stack of the iterative depth-first search implementation with a queue would also produce a breadth-first search algorithm, although a somewhat nonstandard one.[7]

Appendix

NetworkX

The python library NetworkX is a graph-specific module. It can convert different representations of graph. It has a variety of plotting methods.

See below for the code used for drawing the graph. nx.Graph uses our adjacency dictionary of lists as input to construct the graph.

codenetworkX draw graph.py
import networkx as nx
import matplotlib.pyplot as plt
# for my BFS example
options = {
    'node_color': 'red',
    'node_size': 250,
    'width': 3,
    'font_weight': 'bold',
    'with_labels': True
}
H = nx.Graph(graph)  # create a Graph dict mapping nodes to nbrs
list(H.edges())
# [('A', 'B'),
#  ('A', 'C'),
#  ('B', 'D'),
#  ('B', 'E'),
#  ('B', 'F'),
#  ('C', 'G'),
#  ('F', 'H')]
nx.draw_spectral(H, **options)
plt.savefig("undirectedGraph.PNG")
plt.show()
graph = {
    'A': ['B', 'C'],
    'B': ['A', 'D', 'E', 'F'],
    'C': ['A', 'G'],
    'D': ['B'],
    'E': ['B'],
    'F': ['B', 'H'],
    'G': ['C'],
    'H': ['F'],
    'G': ['C']
}
H =  nx.Graph(graph) 
H.degree()
# DegreeView({'A': 2, 'B': 4, 'C': 2, 'D': 1, 'E': 1, 'F': 2, 'G': 1, 'H': 1})
H =  nx.DiGraph(graph) # directed graph
H.in_degree()
# InDegreeView({'A': 0, 'B': 1, 'C': 1, 'D': 1, 'E': 1, 'F': 1, 'G': 1, 'H': 1})

In [38]: for i in H.adjacency():
    ...:     print(i)
    ...:
# ('A', {'B': {}, 'C': {}})
# ('B', {'D': {}, 'E': {}, 'F': {}})
# ('C', {'G': {}})
# ('D', {})
# ('E', {})
# ('F', {'H': {}})
# ('G', {})
# ('H', {})