如何填写大熊猫日期系列

时间:2018-08-06 13:15:28

标签: python pandas dataframe series

我有一个熊猫数据框,如下所示:

    year    week  val1   val2
0   2017   45     10.1   20.2
0   2017   48     10.3   20.3
0   2017   49     10.4   20.4
0   2017   52     10.3   20.5
0   2018    1     10.1   20.2
0   2018    2     10.3   20.3
0   2018    5     10.4   20.4
0   2018    9     10.3   20.5
....

请注意,星期不是连续的。用val1和val2数字作为NaN来填写缺失的行的最佳方法是什么?例如,我的年份为2017年至2018年,而我的周数为45-52和1-9。

非常感谢。

3 个答案:

答案 0 :(得分:2)

您可以groupby年,然后reindex合并现有值和缺失值:

(df.set_index("week")
   .groupby("year")
   .apply(lambda x: x.reindex(x.index.union(np.arange(x.index.min(),x.index.max()))))
   .drop("year", 1)
   .reset_index()
   .rename(columns={"level_1":"week"}))

    year  week  val1  val2
0   2017    45  10.1  20.2
1   2017    46   nan   nan
2   2017    47   nan   nan
3   2017    48  10.3  20.3
4   2017    49  10.4  20.4
5   2017    50   nan   nan
6   2017    51   nan   nan
7   2017    52  10.3  20.5
8   2018     1  10.1  20.2
9   2018     2  10.3  20.3
10  2018     3   nan   nan
11  2018     4   nan   nan
12  2018     5  10.4  20.4
13  2018     6   nan   nan
14  2018     7   nan   nan
15  2018     8   nan   nan
16  2018     9  10.3  20.5

答案 1 :(得分:1)

我将创建一个参考数据框并合并

ref = pd.DataFrame(
    [[y, w] for y, s in df.groupby('year').week for w in range(s.min(), s.max() + 1)],
    columns=['year', 'week']
)

ref.merge(df, 'left')

    year  week  val1  val2
0   2017    45  10.1  20.2
1   2017    46   NaN   NaN
2   2017    47   NaN   NaN
3   2017    48  10.3  20.3
4   2017    49  10.4  20.4
5   2017    50   NaN   NaN
6   2017    51   NaN   NaN
7   2017    52  10.3  20.5
8   2018     1  10.1  20.2
9   2018     2  10.3  20.3
10  2018     3   NaN   NaN
11  2018     4   NaN   NaN
12  2018     5  10.4  20.4
13  2018     6   NaN   NaN
14  2018     7   NaN   NaN
15  2018     8   NaN   NaN
16  2018     9  10.3  20.5

答案 2 :(得分:1)

我会使用Time Series / Date functionality。将yearweek列合并并转换为日期时间索引,并使用以下方法对数据帧进行重新采样:

df.index = pd.to_datetime(
    df.year.map(str) + " " + df.week.map(str) + " 3", 
    format="%Y %W %w"
)
df = df.resample("W").mean()
df.year = df.index.year
df.week = df.index.week

请注意,您的索引已被覆盖。