我有桌子:
import pandas as pd
df_initial = pd.DataFrame([
("2018-05-25", 18, 14),
("2018-06-04", 19, 16),
("2018-06-15", 19, 18),
("2018-06-24", 21, 20),
("2018-07-10", 23, 23),
("2018-07-20", 25, 25),
("2018-08-01", 27, 29),
("2018-08-10", 28, 32),
("2018-08-22", 29, 35),
("2018-09-03", 29, 37),
("2018-09-25", 31, 48),
("2018-10-17", 34, 55),
("2018-11-10", 38, 63),
("2018-11-11", 39, 64),
("2018-12-10", 48, 77),
("2018-12-11", 49, 78),
("2019-01-11", 57, 88),
("2019-02-10", 63, 103),
("2019-02-24", 67, 111),
("2019-03-10", 69, 113),
("2019-03-11", 70, 115),
("2019-04-10", 80, 149),
("2019-05-11", 88, 209)],
columns=["date", "col1", "col2"])
我需要在每月的10号添加NaN行到表中。其中10号没有数据。要使表看起来像这样:
答案 0 :(得分:2)
IIUC使用strftime
找出Ym,然后将第10天的整月装箱,然后通过不以原始df退出(~isin
)和drop_duplicates
,{ {1}}返回
concat
答案 1 :(得分:0)
这是我的解决方案,使用严格的字符串(不转换为日期),但是我得到29行(而不是预期的样本表中的27行):
# create YYYY_MM column for filtering
df_initial["YYYY_MM"] = df_initial["date"].str.slice(0, -3)
# filter dates that DO contain the 10th
df_filtered = df_initial[df_initial['date'].str.endswith("-10")]
# slice off "-10"
df_monthsToFilter = df_filtered["date"].str.slice(0, -3)
# filter out the extraneous
df_filtered2 = df_initial[~df_initial.YYYY_MM.isin(df_monthsToFilter)]
# create df to add data back in
df_toAdd = pd.DataFrame(df_filtered2["YYYY_MM"].unique(), columns=['YYYY_MM'])
df_toAdd['YYYY_MM'] = df_toAdd['YYYY_MM'].astype(str) + "-10"
df_toAdd = df_toAdd.rename(index=str, columns={"YYYY_MM": "date"})
df_initial = df_initial.append(df_toAdd)
# remove YYYY_MM column
df_initial = df_initial.drop(["YYYY_MM"], axis=1)