While we are postponing, life speeds by.
The most fundamental measures of time are point in time timestamp and intervals (fixed or variable), and the difference between them timedelta. These objects provide building blocks for comprehensive timeseries data processes.
Recall a pandas index is simply a way to label rows. One of the main uses of Timestamp and Period objects is indexing. Lists of Timestamp or Period objects are automatically coerced to DatetimeIndex and PeriodIndex respectively when used as index. Python pandas is powerful and convenient in handling timeseries. But there are also traps and tips you should be aware in order to avoid errors.
In this post, you will learn:
- An Example Using Historical Bitcoin Trade Data

Let’s get started.
SAS users need to be aware that pandas allow duplicate index. When we perform strict timeseries analysis, we would need to remove duplicates.
DatetimeIndex
Pandas Timestamp is pandas’ equivalent to the Python’s native datetime object and in many cases a pandas Timestamp is interchangeable with Python’s datetime object. Pandas Timestamp combines the flexibility of datetime and dateutil and the efficiency of vectorized representation from numpy.datetime64.
The example below illustrates how a list of objects with mixed formats is automatically coerced to Datetimeindex by .
- From pandas version 0.20.0. there is a new origin parameter for specifying an alternative starting point for creation of a DatetimeIndex. For example, using 1960-01-01 as the starting date would make pandas dates have the same reference starting date as SAS date
- If you do not specify origin, then the default is origin=’unix’, which defaults to 1970-01-01 00:00:00. This is commonly called ‘unix epoch’ or POSIX time.
- Pandas represents timestamps in nanosecond resolution. Hence the time span that can be represented using a 64-bit integer is limited to approximately 584 years. On the other hand, SAS does not have such limitation as SAS stores dates as integers, datetime and time as real numbers.
SAS date or time are stored internally in numbers and represented according to formats user specified. Example below prints current date, time, and datetime stamps.
As mentioned earlier, pandas is built on top of numpy. Vectorized operations from numpy can be applied directly on Timestamp object to create a sequence of dates or times, which is automatically coerced into DatetimeIndex object. This is illustrated in the next example.
Base SAS does not have vectorized operations. To create similar output, one may write a simple loop such as the one below:
Partial String Indexing
In SAS, subsetting date ranges are performed via WHERE clause or IF statement in a DATA step, or WHERE clause in PROC SQL. Since pandas DatetimeIndex has all the basic functions of regular index , you can use the regular index methods to slice and dice DataFrame or Series, and use DatetimeIndex to perform “partial string indexing”.
For example, omitting the day component extracts all rows belong to a particular year or month. The uses of partial string index are interspersed throughout the remainder of the Indexing section and are identified in the text.
An Example Using Historical Bitcoin Trade Data
It is time for a real life example. We use Python library pandas_datareader to get historical Bitcoin data from Yahoo Finance. In example below, we show steps to import, store, reload stored DataFrame and perform index slicing.

PeriodIndex
Recall that for pandas, a Period is a bounded time segment, i.e. time span, uniform in length with a start and end date with a given frequency and has the associated PeriodIndex, we can define a PeriodIndex directly by using the pd.PeriodIndex or pd.period_range() constructor, which creates a list of periods with frequency specified. Unlike a DatetimeIndex, labels for the PeriodIndex are Period objects.
Note:
- We can convert a series of DatetimeIndex to PeriodIndex by the .to_period() function.
- If you have to parse datetime from string format before you convert it to period index, you can use pd.to_datetime(‘string’, format =’%d-%b-%y’M).dt.to_period(‘M’) Conversely, PeriodIndex can be converted to DatetimeIndex using either pd.to_timestamp or .astype(‘datetime64[ns]’).
As with Datetimeindex, we can apply “partial string indexing” to PeriodIndex. The following example shows how convenient it is to subset data by passing a component of a datetime.
TimeDeltaIndex
TimeDeltaIndex is like a series of numbers with day and/or time units. TimeDeltaIndex can be created by
- taking the difference of two dates,
- converting using to_timedelta(), or
- defining using pd.timedelta_range()
The following example below illustrates each of these methods. While the examples given are mostly in days for simplicity, there is a very wide range of units that can be used.
For reference, example below shows differencing in SAS datetime.
Multiindex
Many timeseries operations cannot be performed if there are duplicated indices. As mentioned earlier, when the datetimeindex is irregular, nothing will be returned from df.index.inferred_freq. There are multiple ways to remove duplicates in pandas. When the duplicates are in the columns, DataFrame.sort_value(), DataFrame.drop_duplicates(),and DataFrame.set_index() are the standard procedures to sort, drop duplicate and set index to clean data. When the duplicates are in DatetimeIndex, we can use DataFrame.index.duplicated() to get the array of boolean values of whether an index is a duplicate.
The keep = ‘first’, keep = ‘last’, or keep = False option is to mask those duplicates from being identified as True. By default, for each set of duplicated values, the first occurrence is set to False and all others to True, in effect, keeping the first occurance when you drop duplicates. To summarize:
• ‘first’, marking duplicates as True except the first occurrence
• ‘last’,marking duplicates as True except the last occurrence
• False, marking all duplicates as True
Here is an example of using DataFrame.index.duplicated(). Because we use keep = False, all the duplicated indices are shown, including first and last.
Stack and Unstack
Many timeseries operations cannot be performed if there are duplicated indices. As mentioned earlier, when the datetimeindex is irregular, nothing will be returned from df.index.inferred_freq. There are multiple ways to remove duplicates in pandas. When the duplicates are in the columns, DataFrame.sort_value(), DataFrame.drop_duplicates(),and DataFrame.set_index() are the standard procedures to sort, drop duplicate and set index to clean data. When the duplicates are in DatetimeIndex, we can use DataFrame.index.duplicated() to get the array of boolean values of whether an index is a duplicate.
The keep = ‘first’/ ‘last’ /False option is to mask those duplicates from being identified as True. By default, for each set of duplicated values, the first occurrence is set to False and all others to True, in effect, keeping the first occurance when you drop duplicates.
• ‘first’, marking duplicates as True except the first occurrence
• ‘last’,marking duplicates as True except the last occurrence
• False, marking all duplicates as True
The following example provides an example of DataFrame.index.duplicated() using the Bitcoin timeseries data. Because we use keep = False, all the duplicated indices are shown, including first and last.
Duplicates in Index
Many timeseries operations cannot be performed if there are duplicated indices. As mentioned earlier, when the datetimeindex is irregular, nothing will be returned from df.index.inferred_freq. There are multiple ways to remove duplicates in pandas. When the duplicates are in the columns, DataFrame.sort_value(), DataFrame.drop_duplicates(),and DataFrame.set_index() are the standard procedures to sort, drop duplicate and set index to clean data. When the duplicates are in DatetimeIndex, we can use DataFrame.index.duplicated() to get the array of boolean values of whether an index is a duplicate.
The keep = ‘first’/ ‘last’ /False option is to mask those duplicates from being identified as True. By default, for each set of duplicated values, the first occurrence is set to False and all others to True, in effect, keeping the first occurance when you drop duplicates.
• ‘first’, marking duplicates as True except the first occurrence
• ‘last’,marking duplicates as True except the last occurrence
• False, marking all duplicates as True
The example below provides an example of DataFrame.index.duplicated() using the Bitcoin timeseries data. Because we use keep = False, all the duplicated indices are shown, including first and last.
The following exaple illustrates three simple ways to drop duplicates. The first two methods use keep = ‘first’, meaning keeping only the first of the duplicates. The third method uses resample, which is a type of groupby. As an added bonus, using resample the datetime frequency is set to ‘D’, which was not set before in the original timeseries imported. Having a frequency is important for timeseries operations involving the index. For example, tshift method will give errors if frequency is not set. More on shifting and resampling in later subsections of this chapter.
Note: Using groupby or resample allows more sophisticated ways of handling duplicates such as keeping the mean or median.
In SAS, many options are available readily for removing duplidates. In the example below, we use SAS PROC TIMESERIES to remove duplicates when we specify the INTERVAL parameter to be equal to ID time unit. Different options are available including how you want to treat missing dates and missing data. We use ACCUMULATE = MEDIAN although you can also specify TOTAL, AVERAGE, MINIMUM, MAXIMUM and etc. In example below, we use the same data from Bitcoin Prices with PROC TIMESERIES.
After duplicated dates and times are removed, one can go on performing more time series analysis.
Missing Values
So far we have not had to deal with missing datetime in using the Bitcoin timeseries because Bitcoin is traded around the world everyday including weekends and holidays.
However, many other timeseries have missing datetime either due to weekends/holidays or errors/omissions. For example, stocks are not traded on weekends or holidays and therefore stocks timeseries will not have any values for those dates. Furthermore, time-based events may not happen every weekday, for example, large bankruptcies. In general, there are three types of missing datetime in a timeseries:
- Weekend This can be easily handled via asfreq(‘B’) or other types of frequency/offset.
- Holidays These can be handled using pandas.tseries.holiday or custom holiday calendar.
- Other causes These missing can be left as missing or imputed using various methods, for example, ffill. Missing datetime handling will be demonstrated using a real life example, using historical Apple stock price.