我有一个熊猫数据框,如下所示:
year week val1 val2
0 2017 45 10.1 20.2
0 2017 48 10.3 20.3
0 2017 49 10.4 20.4
0 2017 52 10.3 20.5
0 2018 1 10.1 20.2
0 2018 2 10.3 20.3
0 2018 5 10.4 20.4
0 2018 9 10.3 20.5
....
请注意,星期不是连续的。用val1和val2数字作为NaN来填写缺失的行的最佳方法是什么?例如,我的年份为2017年至2018年,而我的周数为45-52和1-9。
非常感谢。
答案 0 :(得分:2)
您可以groupby
年,然后reindex
合并现有值和缺失值:
(df.set_index("week")
.groupby("year")
.apply(lambda x: x.reindex(x.index.union(np.arange(x.index.min(),x.index.max()))))
.drop("year", 1)
.reset_index()
.rename(columns={"level_1":"week"}))
year week val1 val2
0 2017 45 10.1 20.2
1 2017 46 nan nan
2 2017 47 nan nan
3 2017 48 10.3 20.3
4 2017 49 10.4 20.4
5 2017 50 nan nan
6 2017 51 nan nan
7 2017 52 10.3 20.5
8 2018 1 10.1 20.2
9 2018 2 10.3 20.3
10 2018 3 nan nan
11 2018 4 nan nan
12 2018 5 10.4 20.4
13 2018 6 nan nan
14 2018 7 nan nan
15 2018 8 nan nan
16 2018 9 10.3 20.5
答案 1 :(得分:1)
我将创建一个参考数据框并合并
ref = pd.DataFrame(
[[y, w] for y, s in df.groupby('year').week for w in range(s.min(), s.max() + 1)],
columns=['year', 'week']
)
ref.merge(df, 'left')
year week val1 val2
0 2017 45 10.1 20.2
1 2017 46 NaN NaN
2 2017 47 NaN NaN
3 2017 48 10.3 20.3
4 2017 49 10.4 20.4
5 2017 50 NaN NaN
6 2017 51 NaN NaN
7 2017 52 10.3 20.5
8 2018 1 10.1 20.2
9 2018 2 10.3 20.3
10 2018 3 NaN NaN
11 2018 4 NaN NaN
12 2018 5 10.4 20.4
13 2018 6 NaN NaN
14 2018 7 NaN NaN
15 2018 8 NaN NaN
16 2018 9 10.3 20.5
答案 2 :(得分:1)
我会使用Time Series / Date functionality。将year
和week
列合并并转换为日期时间索引,并使用以下方法对数据帧进行重新采样:
df.index = pd.to_datetime(
df.year.map(str) + " " + df.week.map(str) + " 3",
format="%Y %W %w"
)
df = df.resample("W").mean()
df.year = df.index.year
df.week = df.index.week
请注意,您的索引已被覆盖。