In working in financial services, the goal was to get insight, build models, and predict/forecast.

However, in the machine learning era, the goals and capacities of analytics have far expanded beyond statistics and modeling. Modelers may need to implement the models themselves. In order to implement efficient models, it is necessary to understand computer science fundamentals. This post is about how we measure efficiency.

The Big O metric

In order to measure run time efficiency, the big $O$ metric is used.

In analytic number theory class by my late professor Patrick Gallagher,the big O metric was associated with whether two different representations are aymptotically close.

If $S$ is a set and $f$ and $g$ are functions $S$ →$R$, we say that$f = O(g)$ if there exists an $k > 0$ such that $\|f(s)\| ≤ kg(s)$ for all $s∈S$.

Time complexity

Therefore,$O(1)$ means some constant (time). Similarly,$O(2)$ or $O(100)$ also mean constant, as long as the number is fixed. That is, it stays the same as the input size increases.

$O(n)$ means linear as in $k*n$ for some $k>0$.

In computer science, the big $O$ is used as a time efficiency metric for data structure: how much time it takes to do each of the following essential functions:

Access (to get)
Search (to find)
Insert (to add)
Delete (to remove)

In computer science context, constant time (e.g. $O(1)$) means that the amound of time it takes to do something (something could be one of the 4 tasks above) is independent of the size of the data. The size of data $n$ is absent from the $O$. Constant time is the most efficient but also difficult to achieve.

Time complexity or run time	$O$ as a function of data size$n$	Example	Explain
constant	$O(1)$	indexed data	data size is irrelevant
log n	$O(log(n))$	binary search	data size is relevant, but unit “cost” decreases as data size increases
linear	$O(n)$	accessing	time is a linear function of data size
$linear*log$	$O(n*log(n))$	merge sort, heap sort	As we can see, its slope $log(n)+1$ is a positive function of $n$
power	$O(n^2)$	inserting/deleting	slope $2n$ is a positive function of $n$
exponential	$O(2^n)$		the worst in efficiency

In the following code,

print_1st_one prints the first item regardless how big the input is. The run time is a constant.
print_all prints all input. So it is a linear function of the input length. The run time is $2*n$, which is $O(n)$.
print_all_ordered_pairs prints all ordered pairs (like a multiplication table) in a double loop. The run time is $O(n^2)$.

codeO_time.py

# O(1)
def print_1st_one(s):
    print(s[0])

# O(n)
def print_all(s):
    for i in s:
        print(i)
    for j in s:
        print(j)

# O(n^2)
def print_all_ordered_pairs(s):
    for i in s:
        for j in s:
            print(i,j)

In general, those with less flexibility are faster for certain basic tasks. For example, in Python tuple are immutable (cannot add or change) is faster than list (Note that although tuple elements cannot be changed, but a list within a tuple can be changed, for example ([1,2,3],)).

Some data structures are very inefficient in terms of time, but they are very useful for what they are made to do.

Space complexity

Space complexity is about how much memory running the code will take as a function of the input size. Space complexity is similar to time complexity.

In-place operations are the least costly in memory space.

In the following code,

say_n_time although it says n times, it says the same thing, which takes constant space $O(1)$.
get_largest, like say_n_time, even though the input has length $n$, the memory required is constant $O(1)$.
say_n_times_and_remember_all requires memory space is $O(n)$ as the computer has to remember all the words printed in the past.

codeO_space.py

# O(1)
def say_n_times(n):
    for i in range(n):
        print("I love you")

# O(1)
def get_largest(nums):
    largest = float('-inf')
    for i in nums:
        if i > largest:
            largest = i
    return largest

# O(n)
def say_n_times_and_remember_all(n):
    remembered = []
    for i in range(n):
        print("I love you ")
        remembered.append(i)

Example: in-place sort integer list

Problem: We want to sort in-place a list of integers such that all even numbers are before all odd numbers.

This problem is somewhat similar to the strategy of quicksort: sort into two parts, smaller and larger (w.r.t pivot), and completely disregard inner ordering within each of the two groups.

The way to do it, as other space $O(1)$ problems, is to use subarrays of the input array. The input array is partitioned into 3 parts: left is even, middle is unsorted, and the right side is odd.

To partition an array in 3, we need 2 pointers.

We start checking from the left side.

Whenever an even number is encountered, left index moves forward by 1
If the number is odd, we swap the number where the left index is with the number where the right index is. Since we are sure that the number moving to the right is odd, we can decrement the right index by 1

Note that the number where the right index was at can be even or odd, the number that is coming to the left side can be even or odd. Hence we cannot increment the left index.
Continue until even_idx and odd_idx cross

The same tactic is used all the in-place sort algorithms that I have seen. For example, see in-place quicksort

A[e], A[o] = A[o], A[e] is before decrementing odd index o.

codeeven_odd_O_1.py


def sortInt(A):
    e = 0 #indices of left and right pointer
    o = len(A) - 1
    while e <= o:
        if A[e] % 2 == 0: # check even 
            e += 1 # left index move forward by 1
        else:
            A[e], A[o] = A[o], A[e]
            o -= 1 # right index move "forward" by 1
    return A

nums = [4, 3, 11 , 1, 6, 10, -10, 33] 
print(sortInt(nums))

This solution has time complexity of n, because we need to check every single number if it is even. We make 1 pass through n numbers.

Time complexity or run time	\(O\) as a function of data size\(n\)	Example	Explain
constant	\(O(1)\)	indexed data	data size is irrelevant
log n	\(O(log(n))\)	binary search	data size is relevant, but unit “cost” decreases as data size increases
linear	\(O(n)\)	accessing	time is a linear function of data size
\(linear*log\)	\(O(n*log(n))\)	merge sort, heap sort	As we can see, its slope \(log(n)+1\) is a positive function of \(n\)
power	\(O(n^2)\)	inserting/deleting	slope \(2n\) is a positive function of \(n\)
exponential	\(O(2^n)\)		the worst in efficiency