我正在与DataFrame
一起工作,看起来像这样
List Numb Name
1 1 one
1 2 two
2 3 three
4 4 four
3 5 five
我正在尝试计算以下输出。
List Numb Name
one 1 one
one 2 two
two 3 three
four 4 four
three 5 five
在当前方法中,我试图遍历各列,然后将值替换为第三列的内容。
例如,如果List[0][1]
等于Numb[1][1]
,则将列List[0][1]
替换为'one'
。
我该如何进行这样的迭代,或者完全不显式地迭代来解决问题?
答案 0 :(得分:2)
使用地图
df['List'] = df['List'].map(df.set_index('Numb')['Name'])
List Numb Name
0 one 1 one
1 one 2 two
2 two 3 three
3 four 4 four
4 three 5 five
答案 1 :(得分:0)
import pandas as pd
df = pd.DataFrame({
'List': [1,1,2,4,3],
'Numb': [1,2,3,4,5],
'Name':['one','two','three','four','five']
})
dfnew = pd.merge(df, df, how='inner', left_on=['List'], right_on=['Numb'])
dfnew = dfnew.rename({'List_x': 'List', 'Numb_x': 'Numb', 'Name_y': 'Name'}, axis='columns')
dfnew = dfnew[['List','Numb','Name']]
dfnew['List'] = dfnew['Name']
print (dfnew)
# List Numb Name
#0 one 1 one
#1 one 2 one
#2 two 3 two
#3 four 4 four
#4 three 5 three
答案 2 :(得分:0)
如何创建字典来帮助您?
import pandas as pd
df = pd.DataFrame({'List': [1, 1, 2, 4, 3], 'Numb': [1, 2, 3, 4, 5], 'Name': ['one', 'two', 'three', 'four', 'five']})
d = dict(zip(df['Numb'], df['Name']))
df = df.replace({'List': d})
答案 3 :(得分:0)
您可以一行完成此操作。看起来您想将数据框加入自身:
df.rename(columns={"List": "List_numb"}).join(df.set_index("Numb")["Name"].to_frame("List"), on="List_numb")[["List", "Numb", "Name"]]
答案 4 :(得分:0)
df['List'] = df.set_index('Numb')['Name'].reindex(df['List']).values
print(df)
List Numb Name
0 one 1 one
1 one 2 two
2 two 3 three
3 four 4 four
4 three 5 five
答案 5 :(得分:0)
类似于Vaishali's answer的答案,但是显式地构建Series
似乎要快一些。
df['List'] = df['List'].map(pd.Series(df['Name'].values, df['Numb']))
时间(Numb和Name列具有唯一值的伪数据,到目前为止,我仅包括三种最快的解决方案):
>>> df
List Numb Name
0 1 1 one_0
1 1 2 two_1
2 2 3 three_2
3 4 4 four_3
4 3 5 five_4
... ... ... ...
4995 1 4996 one_4995
4996 1 4997 two_4996
4997 2 4998 three_4997
4998 4 4999 four_4998
4999 3 5000 five_4999
[5000 rows x 3 columns]
# Timings (i5-6200U CPU @ 2.30GHz, but only relative times are interesting)
>>> %timeit df.set_index('Numb')['Name'].reindex(df['List']).values # jpp
1.14 ms ± 3.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit df['List'].map(df.set_index('Numb')['Name']) # Vaishali
1.04 ms ± 7.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit df['List'].map(pd.Series(df['Name'].values, df['Numb'])) # timgeb
437 µs ± 3.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)