Pandas provides a nice pandas.Series.apply()
function for invoking a function on values in a Series, i.e. transform a column in a DataFrame. Below is a template function demonstrating how flexibly and succinctly we can apply arbitrary transformations to our data. transform_df()
modifies a DataFrame in place based on a variable column name and a variable function using setattr and getattr (note the use of **kwargs allows us to easily apply any function we want on a given column). For a current project, we have a database of hundreds/thousands of columns, many of which need to be modified in different ways. I found this method to be quite helpful in keeping my data cleaning process neat and organized.
import pandas as pd
import numpy as np
def minus_one_to_hex(val):
return hex(val-1)
def weird_function(val, to_mod=True, denom=2, scale=4):
if to_mod:
return val%denom
else:
return val/denom*scale
def transform_df(df, col, func, **kwargs):
setattr(df, col, getattr(df,col).apply(func,**kwargs))
d = {'col1': [1, 2, 3], 'col2': [4, 5, 6], 'col3': [7, 8, 9]}
df = pd.DataFrame(d)
print("Original:")
print(df)
print("Alter 'col1':")
transform_df(df,'col1',np.square)
print(df)
print("Alter 'col2':")
transform_df(df,'col2',minus_one_to_hex)
print(df)
print("Alter 'col3':")
kwargs={'to_mod':True, 'denom':3, 'scale':1.22}
transform_df(df,'col3',weird_function, **kwargs)
print(df)
>>>
Original:
col1 col2 col3
0 1 4 7
1 2 5 8
2 3 6 9
Alter 'col1':
col1 col2 col3
0 1 4 7
1 4 5 8
2 9 6 9
Alter 'col2':
col1 col2 col3
0 1 0x3 7
1 4 0x4 8
2 9 0x5 9
Alter 'col3':
col1 col2 col3
0 1 0x3 1
1 4 0x4 2
2 9 0x5 0