我有一个pandas数据框,其中有一列包含带有一堆属性的类。我希望将其中一些属性扩展到新列中。我有一些有效的代码,但看起来有点讨厌,并使用一个eval。什么是更蟒蛇的方式做到这一点
import pandas as pd
#Boilerplate for minimal, reproducible example
class cl:
class inner:
na1 = "nested atribute one"
na2 = "nested atribute two"
def __init__(self, name):
self.name = name
a1 = "atribute one"
a2 = "atribute one"
inner_atts = inner()
class_object1 = cl("first")
class_object2 = cl("second")
data = [class_object1,class_object2]
data_frame = pd.DataFrame(data,columns=['class object'])
####################
info_to_get = {'name','a1','a2','inner_atts.na1','inner_atts.na2'}
for x in info_to_get:
sr = 'y.{0}'.format(x)
data_frame['{0}'.format(x)] = data_frame['class object'].apply(lambda y: eval(sr,{'y':y}))
print(data_frame)
答案 0 :(得分:2)
使用operator.attrgetter
:
import operator
info_to_get = list(info_to_get)
df[info_to_get] = pd.DataFrame(df['class object'].apply(operator.attrgetter(*info_to_get)).tolist())
输出:
class object inner_atts.na1 \
0 <__main__.cl object at 0x7f08002d27b8> nexted atribute one
1 <__main__.cl object at 0x7f08002d2a90> nexted atribute one
inner_atts.na2 a2 name a1
0 nexted atribute two atribute one first atribute one
1 nexted atribute two atribute one two atribute one
答案 1 :(得分:2)
关于熊猫的第一件事是,它不适合存储和处理无法向量化的任何东西-开销很大,最好使用列表和循环对其进行迭代。
也就是说,我将使用列表理解来做到这一点。
from operator import attrgetter
f = attrgetter(*info_to_get)
pd.DataFrame([f(c) for c in df['class object']], columns=info_to_get)
inner_atts.na2 name a2 inner_atts.na1 a1
0 nexted atribute two first atribute one nexted atribute one atribute one
1 nexted atribute two second atribute one nexted atribute one atribute one
Evidence suggests,使用列表组合处理不可矢量化的数据,可以最大程度地提高速度。