Only in a Python class did I see clearly that the thinking process is different from SAS. The design of a class is the essence of object programming and the essence of what makes Python thought process different from SAS. It is when writing a class that I finally say to myself “Aha, that’s kind of new to me!”

a class in Python
Class attributes and methods
- Naming inputs and functions
Parent class and child classes
Dunder (magic) methods
- Other notable dunders
Decorators
- # reference

a class in Python

A class is a group of things and things associated with that group of things. Using jargon, a class groups objects such as attributes and functions/methods that belong together. “Classes provide a means of bundling data and functionality together.”

The closest thing from SAS to Python class is a SAS procedure specifically those that do very specific things. For example, PROC LOGISTIC, which contains almost all the reusable code that one needs for doing logistic regression in a statical-focused context.

We may turn some of the routine data analysis code into class. For example, as shown below, we define a class for data description that has one attribute and three methods.
The one and only attribute is the data itself, like a parameter to a function. This “parameter” has 3 “sub-functions”. Or from the SAS users’ perspective, 3 macro functions), one for PROC CONTENTS, one for PROC MEANS and PROC FREQ and one for PROC CORR (sort of).

codedata analysis.py

import pandas as pd
import matplotlib.pyplot as plt
class DataDescription(object):
    def __init__(self, data): # 第一个 input总是self。即， 一个instance. self = instance.  Can call "self" any name you want
        self.data = data # 类似其他语言的constructor # can call "self" or anything we want.  'self' is because of convention
     # could have written self.values = first, or self.something = first, but it is easier to keep track of if keep the same names

    def data_content(self):
        print(self.data.info())
        print('\ndtypes are: \n', self.data.dtypes)
        print('\ncolumn names are: \n',self.data.columns.tolist())
        
    def cal_corr(self):
        pearson_corr = self.data.corr(method='pearson')
        spearman_corr = self.data.corr(method='spearman')
        return pearson_corr,spearman_corr

    def descriptive(self):
	    des = self.data.describe(include='all').T
        missing = self.data.isnull().sum().to_frame(name='missing')
	    des= pd.concat([des, missing], axis=1).T
        print(des)
        return des

    # categorical variables study
    def cate_analysis(self):
        cates = self.data.select_dtypes(exclude = np.number).columns
        print("\n", "*"*10,  'Frequency counts', "*"*10)
        for i in cates:
            print("\n",i)
            count = self.data[i].value_counts(sort=False,dropna=False) 
            pct = self.data[i].value_counts(sort=False,dropna=False)/len(self.data)
            combined = pd.concat([count, pct], axis=1)
            combined.columns = ['count','percent']
            print(combined.sort_values(by = 'count', ascending = False))
            # GroupBy descriptives
            print("\n", "*"*10,  'Groupby descriptives', "*"*10)
            des = self.data.groupby(i).describe().round(2).T
            print(des)
            print("\n",i)
            print("correlations")
            pearson_corr = self.data.groupby(i).corr(method='pearson')
            spearman_corr = self.data.groupby(i).corr(method='spearman')
            print(pearson_corr,spearman_corr)
    
import seaborn as sns
df = sns.load_dataset('tips')
a = DataDescription(df)
a.data_content()
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 244 entries, 0 to 243
# Data columns (total 7 columns):
#  #   Column      Non-Null Count  Dtype
# ---  ------      --------------  -----
#  0   total_bill  244 non-null    float64
#  1   tip         244 non-null    float64
# ...
# column names are:
#  ['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size']
a.descriptive()
#         total_bill   tip   sex smoker  day    time  size
# count        244.0 244.0   244    244  244     244 244.0
# ...
# missing          0     0     0      0    0       0     0

Class attributes and methods

Attributes are what’s defined in the __init__ function. Attributes can be accssed with .. For example, .shape is an attribute for numpy arrays and pandas DataFrames.
Methods are those define outside of __init__ function as methods. We need to add () after its name when using it. For example, df.describe() is calling the method and needs ().

Although we by convention use self when we define class methods, we can actually call it almost anything, just like we can name variables with any name.

Attributes can be defined within class or outside

Within class definition

In the super simple example below, myClass is defined with 2 attributes: i and telescope.

The attributes defined within class definition cannot be deleted using del.

After instantiating

Outside of class definition, after instantiate an myClass object, we can also give the object new attributes, which is counter in the example.

The new attribute can be written, used, and deleted. After deleted, we don’t find the counter attribute anymore. They are easy come and easy go.

Notice that although we can define attributes after instantiation anyway we want: .first, .last, .pay, and .X, .Y, .Z. But this will not be reusable because these attributes are specific to this instance, and not to the class.

Also notice that instead of self, I first use martian in a method called f, and then change it to venus in another method called g.

codeclass attribute.py

class myClass:
    """A simple example class"""
  i = 12345
  telescope = "James Webb"
  def f(martian):
          return 'hello Mars'
  def g(venus):
          return 'I live on Mars'

x = myClass()
print(x.f())
print(x.g())
# hello Mars
# I live on Mars
print(x.i)
print(x.telescope)
# 12345
# James Webb
x.counter = 1
while x.counter < 10:
    x.counter = x.counter * 2
print(x.counter)
# 16
del x.counter
print(x.counter)
# ---------------------------------------------------------------------------
# AttributeError                            Traceback (most recent call last)
# <ipython-input-5-6daa4215855c> in <module>()
#       1 del x.counter
# ----> 2 print(x.counter)

# AttributeError: 'myClass' object has no attribute 'counter'
del x.f
# ---------------------------------------------------------------------------
# AttributeError                            Traceback (most recent call last)
# <ipython-input-6-a09835603615> in <module>()
del x.f
x.f()
# 'hello Mars'

Below example compares defining attribute/method within class definition and added for an instance of a class.

codeclass attribute2.py

# 1. add attritutes for an instance of a class
class Data:
    pass
dt1 = Data()
dt1.data =2
dt1.name = "anyName"
print(dt1.data)
print(dt1.name)
print(dt1.data**2)
# 2. attritutes defined within class
class Data:
    def __init__(self, data,name):
        self.data = data
        self.name = name
    def square(self):
        return self.data**2
dt1 = Data(2,'any name')
print(dt1.data)
print(dt1.name)
print(dt1.square())

Naming inputs and functions

class

The example below uses __init__ to initialize object self, which takes on the two values given when we initialize the class object.

In the initializing step, we can name them anything, for example, self.a = num1 and self.b = num2. But it is more consistent if we just use the same name as the parameters num1 and num2.

After the initializing step, anytime we need the input numbers, we must strictly follow self.num1 and self.num2.

When we use a sibling method within class, we need to prefix the function name with self. as well.

class Calculator:
    def __init__(self, num1, num2):
        self.num1 = num1
        self.num2 = num2

    # addition
    def addition(self):
        result = self.num1 + self.num2
        print("Addition:        " + str(result))
    # subtraction
    def subtraction(self):
        result = self.num1 - self.num2
        print("Subtraction:     " + str(result))

    # multiplication
    def multiplication(self):
        result = self.num1 * self.num2
        print("Multiplication:  " + str(result))

    # division
    def division(self):
        result = self.num1 / self.num2
        print("Division:        " + str(result))

#Calling the class
mycalc = Calculator(20, 10)
mycalc.addition()
mycalc.subtraction()
mycalc.multiplication()
mycalc.division()

Another good example of writing clean class without getting into complex algorithm is implementaing a basic linked list.

codeclass example using linked list.py

class Node(object):
    def __init__(self, data, next = None):
        self.data = data
        self.next = next
    def setData(self, data):
        self.data = data
    def getData(self):
        return self.data
    def setNext(self, next):
        self.next = next
    def getNext(self):
        return self.next
n = Node(3)

class LinkedList(object):
    def __init__(self, head = None): # Defining the head of the linked list
        self.head = head
        self.count = 0
    def printLinkedList(self): # printing the data in the linked list
        temp = self.head
        while(temp):
            print(temp.data, end=' ')
            temp = temp.next
    def insertAtStart(self, data): # # inserting the node at the beginning
        newNode = Node(data)
        newNode.next = self.head
        self.head = newNode # update LinkList head
        self.count += 1 # update count
    def insertBetween(self, previousNode, data):
        if (previousNode.next is None):
            print('Previous node should have next node!')
        else:
            newNode = Node(data)
            newNode.next = previousNode.next
            previousNode.next = newNode
            self.count += 1 # update count
    def insertAtEnd(self, data):
        newNode = Node(data)
        temp = self.head
        while(temp.next != None): # get last node
            temp = temp.next
        temp.next = newNode
        self.count += 1 # update count
    def delete(self, data): # deleting an item based on data
        temp = self.head
        if (temp.next is not None):  # if data to be deleted is the head
            if(temp.data == data):
                self.head = temp.next
                self.count -= 1
                temp = None
                return
            else: #  else search all the nodes
                while(temp.next != None):
                    if(temp.data == data):
                        break
                    prev = temp #save current node as previous so that we can go on to next node
                    temp = temp.next
                if temp == None: # node not found
                    return
                prev.next = temp.next # if found, then drop the node by omiting it from the link
                self.count -= 1
                return
    def search(self, node, data):# iterative search
        if node == None:
            return False
        if node.data == data:
            return True
        return self.search(node.getNext(), data)
    def search_list(self, node, data): # iterative search return found item instead of True/False
        while node and node.data != data:
            node = node.next
        return node
            
LL = LinkedList()
LL.head = Node("Hello")
print("After defining head node")
LL.printLinkedList()

Parent class and child classes

We can create a new “child” class to inherite from an existing “parent” class.

Child class can:

get features from a parent class
modify features from parent
add new features

Instead of copy and pasting the code from parent and create a new class, using child class makes the code more maintainable.

The logical order of parent-child follows the following chart:

Dunder (magic) methods

Dunder (short for double underscore) methods are special methods.

__init__() is one of them. We can customize dunder methods for the user defined classes. A class does not have to have an __init__() method. But most of the time we like to create objects initialized with a specific initial state. Therefore a class may define a special method named __init__(), like this:

When a class defines an __init__() method, class instantiation automatically invokes __init__() for the newly-created class instance.

__call__() makes an instance of a class callable (see SOF). Whatever you want it do it when the genie is called should be placed within the __call__() function.

class Foo:
  def __call__(self):
    print('Master, I am called')

foo_instance = Foo()
foo_instance() #this is calling the __call__ method
# Master, I am called

Sometimes you read code someone else has written, and there are no example on how to use the classes. Remember the following:

__init__() has the attributes which we use with the ..

__call__() tells us how to use callable function, namely what arguments to provide it.

    def __call__(self, *args: Any, **kwds: Any) -> Any:

For example, in example below (example from (realPython)[https://realpython.com/fibonacci-sequence-python/]), the n in __call__(self, n) tells us what arguments to provide in order to use instance of the class: we are supposed to give it an integer.

class fibonacci:
    def __init__(self):
        self.cache = [0, 1]
    
    def __call__(self, n-> int):
        if not(isinstance(n, int) and n >= 0):
            raise ValueError("n needs to be positive integer")
            if n < len(self.cache):
            return self.cache[n]
        else:
            fib = self(n -1) + self(n - 2)
            self.cache.append(fib)
        return self.cache[n]
a = Fibonacci()
print(a(10))

Other notable dunders

Other most commonly used dunder methods are __str__() and __repr__(). Both of them allow us to print attributes for the class object instance. Without them, if we print an instance of a class, all we see is “main…”, nothing meaningful will be printed.

Other dunder methods that should be noted are arithmetics and comparison ones. For example, __add__(), __lt__(), __len__(), and many more. We can customize arithmetics and comparisons to suit the class object and its attributes.

Another interesting one is __hash__.

Decorators

The @ symbol is used with decorators. A decorator is a function returning another function, usually applied as a function transformation using the @wrapper syntax. Common examples for decorators are classmethod() and staticmethod().

Everything in functools are decorators.

A simple example usage is to use decorator on a class method so that it can be used as an attribute as well.

reference

A Guide to Python’s Magic Methods by Rafe Kettler

Turtle code

Turtle library