import pandas as pd
import numpy as np
one = pd.read_csv('data1.csv')
two = pd.read_csv('data2.csv')
我是这样写的,一个显示
A Date
10 2011-01-03
20 2011-01-04
10 2011-01-06
20 2011-01-07
30 2011-01-10
40 2011-01-13
25 2011-01-15
・
・
・
两次演出
B Date
15 2011-01-01
15 2011-01-02
15 2011-01-03
25 2011-01-07
35 2011-01-10
10 2011-01-13
25 2011-01-15
・
・
・
我想在数据框标记错误时将0用作丢失日期的数据。 现在我写了代码
one_and_two = pd.merge(one, two, on='Date', how='inner')
print(one_and_two)
运行,一个和两个是
A Date B
0 10 2011-01-03 15
1 20 2011-01-07 25
2 30 2011-01-10 35
3 40 2011-01-13 10
4 25 2011-01-15 25
・
・
・
理想的输出是
A Date B
0 0 2011-01-01 15
1 0 2011-01-02 15
2 10 2011-01-03 15
3 20 2011-01-04 0
4 0 2011-01-05 0
5 10 2011-01-06 0
6 20 2011-01-07 25
7 0 2011-01-08 0
8 0 2011-01-09 0
9 30 2011-01-10 35
・
・
・
数据帧的日期为2011-01-01 〜2011-12-31, 我想将0设为缺少日期的数据,但是 我该怎么办?我的代码有什么问题?
答案 0 :(得分:3)
在定义的日期范围内对reindex
使用外部联接:
df = (pd.merge(one, two, on='Date', how='outer')
.fillna(0)
.sort_values('Date')
.set_index('Date'))
df = (df.reindex(pd.date_range('2011-01-01', '2011-12-31'), name='Date'), fill_value=0)
.reset_index()
.reindex(columns=['A','Date','B']))
或按最小日期和最大日期
df = (df.reindex(pd.date_range(df.index.min(), df.index.max(), name='Date'), fill_value=0)
.reset_index()
.reindex(columns=['A','Date','B']))
print (df)
A Date B
0 0.0 2011-01-01 15.0
1 0.0 2011-01-02 15.0
2 10.0 2011-01-03 15.0
3 20.0 2011-01-04 0.0
4 0.0 2011-01-05 0.0
5 10.0 2011-01-06 0.0
6 20.0 2011-01-07 25.0
7 0.0 2011-01-08 0.0
8 0.0 2011-01-09 0.0
9 30.0 2011-01-10 35.0
10 0.0 2011-01-11 0.0
11 0.0 2011-01-12 0.0
12 40.0 2011-01-13 10.0
13 0.0 2011-01-14 0.0
14 25.0 2011-01-15 25.0
答案 1 :(得分:2)
使用外部联接捕获所有日期,重新索引最小和最大日期之间的范围,以捕获该范围内的所有日期,用零填充空值,然后最终重置日期索引并按所需顺序重新排列( A
,Date
,B
)。
one_and_two = (
pd.merge(one, two, on='Date', how='outer')
.set_index('Date')
.reindex(
pd.DatetimeIndex(
start=one_and_two.index.min(),
end=one_and_two.index.max(),
freq='D',
name='Date'))
.fillna(0)
.reset_index()
[['A', 'Date', 'B']]
)
>>> one_and_two
A Date B
0 0 2011-01-01 15
1 0 2011-01-02 15
2 10 2011-01-03 15
3 20 2011-01-04 0
4 0 2011-01-05 0
5 10 2011-01-06 0
6 20 2011-01-07 25
7 0 2011-01-08 0
8 0 2011-01-09 0
9 30 2011-01-10 35
10 0 2011-01-11 0
11 0 2011-01-12 0
12 40 2011-01-13 10
13 0 2011-01-14 0
14 25 2011-01-15 25