熊猫将字典映射到列表中包含元组的列

时间:2021-01-13 00:34:04

标签: python-3.x pandas list tuples

我在 Pandas 数据帧的每一行中都有元组列表。我试图将列表中元组的第一个元素应用于字典的值。但是,我找到了 regex 方法,但这似乎不是编写代码的理想方式。我想知道是否有比我在 Try#1 中所做的更好的处理方法?

from operator import itemgetter
import pandas as pd

# sample data
l1 = ['1','2','3']
l2 = ['test1','test2','test3']
l3 = [[(1,0.95)],[(2,0.10),(3,0.20)],[(3,0.30)]]

df = pd.DataFrame({'id':l1,'text':l2,'score':l3})
print(df)

#Preview: 
id   text                 score
1  test1           [(1, 0.95)]
2  test2  [(2, 0.1), (3, 0.2)]
3  test3            [(3, 0.3)]

di = {1:'Math',2:'Science',3:'History',4:'Physics'}

Try #1: Does the trick but it laborious, manual, and not ideal way.
 
df['score'].astype(str).str.replace('1,','Math,').str.replace('2,','Science,').str.replace('3,','History,').str.replace('4,','Science,')

Try#2: Getting all NaNs even if I convert to string.
df["score"].astype(str).map(di)


Looking for the output like this: 
   #Preview: 
    id   text                 score
    1  test1           [(Math, 0.95)]
    2  test2           [(Science, 0.1), (History, 0.2)]
    3  test3           [(History, 0.3)]

1 个答案:

答案 0 :(得分:1)

列表理解可以在这里提供帮助;当其他数据结构嵌入其中时,Pandas 的效率也会受到一定程度的阻碍。

df["score"] = [[(di[left], right) 
                 for left, right in entry] 
                 for entry in df.score]
df


    id  text    score
0   1   test1   [(Math, 0.95)]
1   2   test2   [(Science, 0.1), (History, 0.2)]
2   3   test3   [(History, 0.3)]