这似乎应该是一个常见的用例,但我没有找到任何好的指导。我有一个有效的解决方案,但我宁愿使用矢量化查找而不是使用Pandas apply()
函数。
这是我正在做的一个例子:
import pandas as pd
example_dict = {
"category1":{
"field1": 0.0,
"filed2": 5.0},
"category2":{
"field1": 5.0,
"field2": 8.0}}
d = {"ids": range(10),
"category": ["category1" if x % 2 == 0 else "category2" for x in range(10)]}
df = pd.DataFrame(d)
# The operation I am trying to vectorize
df['category_data'] = df.apply(lambda row: example_dict[row['category']], axis=1)
在最后一行,您可以看到我使用apply()
函数执行字典查找的位置。我的直觉告诉我应该有一种方法来矢量化这个。我可能错了,但我也想知道。我经常遇到需要在字典中查找信息并将其添加为a DataFrame
列的情况。
答案 0 :(得分:6)
使用map
df['map']=df.category.map(example_dict)
df
Out[839]:
category ids category_data \
0 category1 0 {'field1': 0.0, 'filed2': 5.0}
1 category2 1 {'field1': 5.0, 'field2': 8.0}
2 category1 2 {'field1': 0.0, 'filed2': 5.0}
3 category2 3 {'field1': 5.0, 'field2': 8.0}
4 category1 4 {'field1': 0.0, 'filed2': 5.0}
5 category2 5 {'field1': 5.0, 'field2': 8.0}
6 category1 6 {'field1': 0.0, 'filed2': 5.0}
7 category2 7 {'field1': 5.0, 'field2': 8.0}
8 category1 8 {'field1': 0.0, 'filed2': 5.0}
9 category2 9 {'field1': 5.0, 'field2': 8.0}
map
0 {'field1': 0.0, 'filed2': 5.0}
1 {'field1': 5.0, 'field2': 8.0}
2 {'field1': 0.0, 'filed2': 5.0}
3 {'field1': 5.0, 'field2': 8.0}
4 {'field1': 0.0, 'filed2': 5.0}
5 {'field1': 5.0, 'field2': 8.0}
6 {'field1': 0.0, 'filed2': 5.0}
7 {'field1': 5.0, 'field2': 8.0}
8 {'field1': 0.0, 'filed2': 5.0}
9 {'field1': 5.0, 'field2': 8.0}
如果你需要他们进入不同的列
pd.DataFrame(df['map'].tolist())
Out[843]:
field1 field2 filed2
0 0.0 NaN 5.0
1 5.0 8.0 NaN
2 0.0 NaN 5.0
3 5.0 8.0 NaN
4 0.0 NaN 5.0
5 5.0 8.0 NaN
6 0.0 NaN 5.0
7 5.0 8.0 NaN
8 0.0 NaN 5.0
9 5.0 8.0 NaN
或
df['map'].apply(pd.Series)
Out[844]:
field1 field2 filed2
0 0.0 NaN 5.0
1 5.0 8.0 NaN
2 0.0 NaN 5.0
3 5.0 8.0 NaN
4 0.0 NaN 5.0
5 5.0 8.0 NaN
6 0.0 NaN 5.0
7 5.0 8.0 NaN
8 0.0 NaN 5.0
9 5.0 8.0 NaN
答案 1 :(得分:2)
您可以从example_dict
创建第二个DataFrame,然后merge
创建两个Dataframe
d2 = pd.DataFrame(example_dict.keys(),columns=
['category']).assign(category_data=example_dict.values())
df.merge(d2,on='category',how='left')
category ids category_data
0 category1 0 {u'filed2': 5.0, u'field1': 0.0}
1 category2 1 {u'field2': 8.0, u'field1': 5.0}
2 category1 2 {u'filed2': 5.0, u'field1': 0.0}
3 category2 3 {u'field2': 8.0, u'field1': 5.0}
4 category1 4 {u'filed2': 5.0, u'field1': 0.0}
5 category2 5 {u'field2': 8.0, u'field1': 5.0}
6 category1 6 {u'filed2': 5.0, u'field1': 0.0}
7 category2 7 {u'field2': 8.0, u'field1': 5.0}
8 category1 8 {u'filed2': 5.0, u'field1': 0.0}
9 category2 9 {u'field2': 8.0, u'field1': 5.0}
将字典值分隔为列
d2 = pd.DataFrame(example_dict).T
df.merge(d2,how='left',left_on='category',right_index=True)
category ids field1 field2 filed2
0 category1 0 0.0 NaN 5.0
1 category2 1 5.0 8.0 NaN
2 category1 2 0.0 NaN 5.0
3 category2 3 5.0 8.0 NaN
4 category1 4 0.0 NaN 5.0
5 category2 5 5.0 8.0 NaN
6 category1 6 0.0 NaN 5.0
7 category2 7 5.0 8.0 NaN
8 category1 8 0.0 NaN 5.0
9 category2 9 5.0 8.0 NaN