我有两个数据帧
import pandas as pd
a = pd.DataFrame( { 'port':[1,1,0,1,0], 'cd':[1,2,3,2,1], 'date':["2014-02-26","2014-02-25","2014-02-26","2014-02-26","2014-02-25"] } )
b = pd.DataFrame( { 'port':[0,1,0,1,0], 'fac':[2,1,2,2,3], 'date': ["2014-02-25","2014-02-25","2014-02-26","2014-02-26","2014-02-27"] } )
我需要做的是获取每个日期端口对,例如说端口0和日期2014-02-25,在fac
中查找b
值并将其填充到新列中在a
中。因此,输出应类似于
port cd date fac
1 1 "2014-02-26" 2
1 2 "2014-02-25" 1
... (so on) ...
我尝试仅合并日期和端口上的框架,但是出现了一个错误,我认为这是由于数据框架的大小不同而引起的-我有点不希望它能正常工作
答案 0 :(得分:2)
如果您希望合并两个数据框,则应使用merge
import pandas as pd
a = pd.DataFrame( { 'port':[1,1,0,1,0], 'cd':[1,2,3,2,1],
'date':["2014-02-26","2014-02-25","2014-02-26","2014-02-26","2014-02-25"]})
b = pd.DataFrame( { 'port':[0,1,0,1,0], 'fac':[2,1,2,2,3],
'date': ["2014-02-25","2014-02-25","2014-02-26","2014-02-26","2014-02-27"]})
df = a.merge(b)
print (df)
输出:
port cd date fac
0 1 1 2014-02-26 2
1 1 2 2014-02-26 2
2 1 2 2014-02-25 1
3 0 3 2014-02-26 2
4 0 1 2014-02-25 2
答案 1 :(得分:1)
我认为需要drop_duplicates
和merge
:
cols = ['port','date']
df = a.drop_duplicates(cols).merge(b, on=cols)
print (df)
port cd date fac
0 1 1 2014-02-26 2
1 1 2 2014-02-25 1
2 0 3 2014-02-26 2
3 0 1 2014-02-25 2
但是如果想要所有重复对的组合:
cols = ['port','date']
df1 = a.merge(b, on=cols)
print (df1)
port cd date fac
0 1 1 2014-02-26 2
1 1 2 2014-02-26 2
2 1 2 2014-02-25 1
3 0 3 2014-02-26 2
4 0 1 2014-02-25 2
答案 2 :(得分:1)
我建议您在数据框架A 中创建新列,并通过“ numpy.vectorize”填充它
import pandas as pd
import numpy as np
A = pd.DataFrame({'port': [1, 1, 0, 1, 0], 'cd': [1, 2, 3, 2, 1], 'date': ["2014-02-26", "2014-02-25", "2014-02-26", "2014-02-26", "2014-02-25"]})
B = pd.DataFrame({'port': [0, 1, 0, 1, 0], 'fac': [2, 1, 2, 2, 3], 'date': ["2014-02-25", "2014-02-25", "2014-02-26", "2014-02-26", "2014-02-27"]})
在数据框B 中设置索引,以按“日期”和“端口”进行访问:
C = B.set_index(['date', 'port'])
然后,创建函数,该函数将应用于数据帧A 中的每一行:
def get_fac(date, port):
try:
return C.loc[date].loc[port]['fac']
except KeyError:
return ''
A['fac'] = np.vectorize(get_fac)(A['date'], A['port'])
这是输出:
cd date port fac
0 1 2014-02-26 1 2
1 2 2014-02-25 1 1
2 3 2014-02-26 0 2
3 2 2014-02-26 1 2
4 1 2014-02-25 0 2