我有一个类似的数据框:
time A time B 2017-11 2017-12 2018-01 2018-02
2017-01-24 2020-01-01 NaN NaN NaN NaN
2016-11-28 2020-01-01 NaN 4.0 2.0 2.0
2017-03-18 2017-12-21 NaN NaN NaN NaN
我希望在时间A和时间B之间的列名称时将所有NaN替换为0,例如,对于第三行,时间范围是2017-03-18到2017-12-21,所以数据在第三行列名称在此范围之间的行,如果是NaN,则将其替换为0,否则保持相同。希望清楚。感谢
答案 0 :(得分:1)
试试这段代码:
newdf=df[(df.date>some_date) & (df.date<somedate)]
newdf.fillna(0)
newdf是您要查找的数据框架。
答案 1 :(得分:0)
也许,不是最好的解决方案,但它有效。
这是我的测试样本:
d = pd.DataFrame([
{"time A": "2017-01-24", "time B": np.nan, "2016-11": np.nan, "2016-12": np.nan, "2017-01": np.nan, "2017-02": np.nan},
{"time A": "2016-11-28", "time B": np.nan, "2016-11": np.nan, "2016-12": 4, "2017-01": 2, "2017-02": 2},
{"time A": "2016-12-18", "time B": "2017-01-01", "2016-11": np.nan, "2016-12": np.nan, "2017-01": np.nan, "2017-02": np.nan},
])
d["time B"].fillna("2020-01-01", inplace=True)
d.set_index(["time A", "time B"], inplace=True)
初始表:
time A time B 2016-11 2016-12 2017-01 2017-02
2017-01-24 2020-01-01 NaN NaN NaN NaN
2016-11-28 2020-01-01 NaN 4.0 2.0 2.0
2016-12-18 2017-01-01 NaN NaN NaN NaN
看起来time A
是开放日期,time B
是关闭日期,或者像那样。因此,为方便起见,我已将任何未来日期填入缺失time B
,例如'2020-01-01'
我不喜欢使用数据透视表,因此我使用{{3}}来堆叠它并格式化日期列:
d_stack = d.stack(dropna=False).reset_index()
d_stack.columns = ["time A", "time B", "month", "value"]
for col in ["time A", "time B"]:
d_stack[col] = pd.to_datetime(d_stack[col], format="%Y-%m-%d", errors="ignore")
d_stack["month"] = pd.to_datetime(d_stack["month"], format="%Y-%m", errors="ignore")
现在填写缺失值更方便
def fill_existing(x):
if (x["time A"] <= x["month"] <= x["time B"] and
np.isnan(x["value"])):
return 0
else:
return x["value"]
d_stack["value"] = d_stack.apply(fill_existing, axis=1)
<强>输出强>:
time A time B month value
0 2017-01-24 2020-01-01 2016-11-01 NaN
1 2017-01-24 2020-01-01 2016-12-01 NaN
2 2017-01-24 2020-01-01 2017-01-01 NaN
3 2017-01-24 2020-01-01 2017-02-01 0.0
最后,格式化month
返回并{{3}}返回初始表格式:
d_stack["month"] = d_stack["month"].apply(lambda x: x.strftime("%Y-%m"))
pd.pivot_table(d_stack, columns="month", index=["time A", "time B"],
values="value", aggfunc=np.sum)
<强>结果强>:
time A time B 2016-12 2017-01 2017-02
2016-11-28 2020-01-01 4.0 2.0 2.0
2016-12-18 2017-01-01 NaN 0.0 NaN
2017-01-24 2020-01-01 NaN NaN 0.0