将缺失的日期添加到数据框

时间:2020-06-09 03:54:38

标签: python pandas dataframe

我有一个数据框,其中包含一个“单片漫画”列表,当前看起来像这样:

struct ImageHolderView: View {
   let image: UIImage            // << data

   var body: some View {
      VStack {
         Image(uiImage: image)    // << present
         Button("Upload") {
           upload(to: url, data: image.jpegData)    // << upload
         }
      }
   }
}

尽管每个插曲都是每周发行一次,但有时会延迟或中断,从而导致日期间隔不规则。我想做的是添加一个缺少的日期。例如,在1997-08-11和1997-08-25之间,应该有1997-08-18(从1997-08-11开始的7天)未发布该情节。您能帮我解决如何操作此代码吗?

谢谢。

2 个答案:

答案 0 :(得分:0)

我使用relativedelta和列表理解来获取每行14天的时间间隔,并使用.shift(1).np.where()与另一行进行比较,其中1返回一行,其中我们想在前面插入一行。然后,我遍历数据框并将相关行附加到另一个数据框。然后,我用pd.concat将两个数据框组合在一起,按日期排序,删除了帮助列并重置了索引。

可能有一些差距,就像其他人提到的那样,例如22天以上,但这应该可以带您正确的方向。也许您可以将其变成一个函数并多次运行,这就是为什么我在末尾添加了.reset_index(drop=True)。显然,您可以对此进行更高级的设置,但是希望对您有所帮助。

from dateutil.relativedelta import relativedelta
import pandas
from datetime import datetime

df = pd.DataFrame({'Date': {0: '1997-07-19',
  1: '1997-07-28',
  2: '1997-08-04',
  3: '1997-08-11',
  4: '1997-08-25',
  5: '1997-09-01',
  6: '1997-09-08',
  7: '1997-09-13'},
 '#': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8},
 'Title': {0: 'Romance Dawn - The Dawn of the Adventure',
  1: 'That Guy, "Straw Hat Luffy"',
  2: 'Introducing "Pirate Hunter Zoro"',
  3: 'Marine Captain "Axe-Hand Morgan"',
  4: 'Pirate King and Master Swordsman',
  5: 'The First Crew Member',
  6: 'Friends',
  7: 'Introducing Nami'},
 'Pages': {0: 53, 1: 23, 2: 21, 3: 19, 4: 19, 5: 23, 6: 20, 7: 19}})

df['Date'] = pd.to_datetime(df['Date'])
df['Date2'] = [d - relativedelta(days=-14) for d in df['Date']]
df['Date3'] = np.where((df['Date'] >= df['Date2'].shift(1)), 1 , 0)
df1 = pd.DataFrame({})
n=0
for j in (df['Date3']):
    n+=1
    if j == 1:
        new_row = pd.DataFrame({"Date": df['Date'][n-1] - relativedelta(days=7)}, index=[n])
        df1=df1.append(new_row)
df = pd.concat([df, df1]).sort_values('Date').drop(['Date2', 'Date3'], axis=1).reset_index(drop=True)
df

输出:

    Date        #    Title                                      Pages
0   1997-07-19  1.0  Romance Dawn - The Dawn of the Adventure   53.0
1   1997-07-28  2.0  That Guy, "Straw Hat Luffy"                23.0
2   1997-08-04  3.0  Introducing "Pirate Hunter Zoro"           21.0
3   1997-08-11  4.0  Marine Captain "Axe-Hand Morgan"           19.0
4   1997-08-18  NaN  NaN                                        NaN
5   1997-08-25  5.0  Pirate King and Master Swordsman           19.0
6   1997-09-01  6.0  The First Crew Member                      23.0
7   1997-09-08  7.0  Friends                                    20.0
8   1997-09-13  8.0  Introducing Nami                           19.0

答案 1 :(得分:0)

您可以使用shift内置函数。

df['day_between'] =  df['Date'].shift(-1) - df['Date']
那么print(df[['Date', 'day_between']])

输出为:

        Date day_between
0 1997-07-19      9 days
1 1997-07-28      7 days
2 1997-08-04      7 days
3 1997-08-11     14 days
4 1997-08-25      7 days
5 1997-09-01      7 days
6 1997-09-08      5 days
7 1997-09-13         NaT