我有两个数据框
df1:
col2 col3 dept
date
2020-05-06 29 21 A
2020-05-07 56 12 B
2020-05-08 82 15 C
2020-05-09 13 9 D
2020-05-10 35 13 E
2020-05-11 53 87 F
2020-05-12 25 9 G
2020-05-13 23 63 H
df2:
col2 dept
date
2020-05-06 64 A
2020-05-07 41 B
2020-05-08 95 C
2020-05-09 58 D
2020-05-10 89 E
2020-05-11 37 F
2020-05-12 24 G
2020-05-13 67 H
我想用col2
中df1
列中的值更新col2
中的df2
列,这样我的输出看起来像:
col2 col3 dept
date
2020-05-06 64 21 A
2020-05-07 41 12 B
2020-05-08 95 15 C
2020-05-09 58 9 D
2020-05-10 89 13 E
2020-05-11 37 87 F
2020-05-12 24 9 G
2020-05-13 67 63 H
我写了一些看起来像这样的代码
df1=df1.set_index('dept')
df1.update(df2.set_index('dept'))
df1=df1.reset_index()
但是它将df1
中的索引重置为整数而不是日期,因此我得到的输出如下:
dept col2 col3
0 A 64 21
1 B 41 12
2 C 95 15
3 D 58 9
4 E 89 13
5 F 37 87
6 G 24 9
7 H 67 63
我的完整代码如下:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import datetime
dept=['A','B','C','D','E','F','G','H']
date_today = datetime.date.today()
days = pd.date_range(date_today, date_today + timedelta(7), freq='D')
np.random.seed(seed=1111)
data1 = np.random.randint(1, high=100, size=len(days))
data2 = np.random.randint(1, high=100, size=len(days))
df1 = pd.DataFrame({'date': days, 'dept':dept,'col2': data1, 'col3': data2})
df1 = df1.set_index('date')
print(df1)
dept=['A','B','C','D','E','F','G','H']
date_today = datetime.date.today()
days = pd.date_range(date_today, date_today + timedelta(7), freq='D')
np.random.seed(seed=1331)
data3 = np.random.randint(1, high=100, size=len(days))
df2 = pd.DataFrame({'date': days, 'dept':dept,'col2': data3})
df2 = df2.set_index('date')
print(df2)
df1=df1.set_index('dept')
df1.update(df2.set_index('dept'))
df1=df1.reset_index()
print(df1)
如何用df1
更新df2
并将索引日期格式保留为df1
?
答案 0 :(得分:1)
据我对您的样本的了解,您基于df1
索引和列df2
从date
更新了dept
。您需要将dept
添加到索引并调用update
df1 = df1.set_index('dept', append=True)
df1 = df1.update(df2.set_index('dept', append=True))
df1 = df1.reset_index('dept')
Out[35]:
dept col2 col3
date
2020-05-06 A 64 21
2020-05-07 B 41 12
2020-05-08 C 95 15
2020-05-09 D 58 9
2020-05-10 E 89 13
2020-05-11 F 37 87
2020-05-12 G 24 9
2020-05-13 H 67 63
答案 1 :(得分:0)
您可以使用df.update
进行此操作:
In [2162]: df1['col2'].update(df2['col2'])
In [2163]: df1
Out[2163]:
col2 col3 dept
date
2020-05-06 64 21 A
2020-05-07 41 12 B
2020-05-08 95 15 C
2020-05-09 58 9 D
2020-05-10 89 13 E
2020-05-11 37 87 F
2020-05-12 24 9 G
2020-05-13 67 63 H
答案 2 :(得分:0)
您可以先使用concat
然后使用groupby
df_out=pd.concat([df1,df2],sort=False).groupby(level=0).last()
Out[261]:
col2 col3 dept
date
2020-05-06 64 21.0 A
2020-05-07 41 12.0 B
2020-05-08 95 15.0 C
2020-05-09 58 9.0 D
2020-05-10 89 13.0 E
2020-05-11 37 87.0 F
2020-05-12 24 9.0 G
2020-05-13 67 63.0 H