我在 Pandas 数据帧的每一行中都有元组列表。我试图将列表中元组的第一个元素应用于字典的值。但是,我找到了 regex 方法,但这似乎不是编写代码的理想方式。我想知道是否有比我在 Try#1 中所做的更好的处理方法?
from operator import itemgetter
import pandas as pd
# sample data
l1 = ['1','2','3']
l2 = ['test1','test2','test3']
l3 = [[(1,0.95)],[(2,0.10),(3,0.20)],[(3,0.30)]]
df = pd.DataFrame({'id':l1,'text':l2,'score':l3})
print(df)
#Preview:
id text score
1 test1 [(1, 0.95)]
2 test2 [(2, 0.1), (3, 0.2)]
3 test3 [(3, 0.3)]
di = {1:'Math',2:'Science',3:'History',4:'Physics'}
Try #1: Does the trick but it laborious, manual, and not ideal way.
df['score'].astype(str).str.replace('1,','Math,').str.replace('2,','Science,').str.replace('3,','History,').str.replace('4,','Science,')
Try#2: Getting all NaNs even if I convert to string.
df["score"].astype(str).map(di)
Looking for the output like this:
#Preview:
id text score
1 test1 [(Math, 0.95)]
2 test2 [(Science, 0.1), (History, 0.2)]
3 test3 [(History, 0.3)]
答案 0 :(得分:1)
列表理解可以在这里提供帮助;当其他数据结构嵌入其中时,Pandas 的效率也会受到一定程度的阻碍。
df["score"] = [[(di[left], right)
for left, right in entry]
for entry in df.score]
df
id text score
0 1 test1 [(Math, 0.95)]
1 2 test2 [(Science, 0.1), (History, 0.2)]
2 3 test3 [(History, 0.3)]