Pandas - Groupby + Shift无法按预期工作

时间:2018-01-03 17:38:31

标签: python pandas group-by

我有一个df,我正在尝试执行groupbyshift。但是,输出不是我想要的。

我想将“下一个”DueDate转移到之前的日期。因此,如果当前DueDate为1/1,而下一个DueDate为6/30,则为{{1}的所有行插入NextDueDate为6/30的新列}}。然后当前DueDate==1/1为6/30时,为DueDate所有行插入下一个DueDate

DueDate==6/30

我在Original df ID Document Date DueDate 1 ABC 1/31 1/1 1 ABC 2/28 1/1 1 ABC 3/31 1/1 1 ABC 4/30 6/30 1 ABC 5/31 6/30 1 ABC 6/30 7/31 1 ABC 7/31 7/31 1 ABC 8/31 9/30 Desired output df ID Document Date DueDate NextDueDate 1 ABC 1/31 1/1 6/30 1 ABC 2/28 1/1 6/30 1 ABC 3/31 1/1 6/30 1 ABC 4/30 6/30 7/31 1 ABC 5/31 6/30 7/31 1 ABC 6/30 7/31 9/30 1 ABC 7/31 7/31 9/30 1 ABC 8/31 9/30 10/31 的路线上有很多变化,但它并没有让我想到我想要的地方。

2 个答案:

答案 0 :(得分:2)

IIUC

l=[]
for _, df1 in df.groupby(["ID", "Document"]):
    s = df1.groupby('DueDate', as_index=False).size().to_frame('number').reset_index()
    s.DueDate = s.DueDate.shift(-1).fillna('10/31')
    df1['Nextduedate'] = s.DueDate.repeat(s.number).values
    l.append(df1)



New_df=pd.concat(l)

如果您有多个小组:

   # Finds the difference between first and last non-zero element
find_difference <- function(row) {
  # Remove NAs
  row <- row[!is.na(row)]

  # Find number of non-NA entries
  len <- length(row)

  # Check to see if there is more than 1 non-NA observation
  if (len > 1) {
    difference <- row[len] - row[len - 1]
    return(difference)

  # If not more than one non-NA observation return NA
  } else {
    return(NA)
  }


}

# Use apply across each row (MARGIN = 1) with defined function
# Exclude the first column because it contains the ID
test$diff <- apply(test[, 2:ncol(test)], MARGIN = 1, FUN = find_difference)

答案 1 :(得分:2)

定义函数f以根据移位日期执行替换 -

def f(x):
     i = x.drop_duplicates()
     j = i.shift(-1).fillna('10/30')

     return x.map(dict(zip(i, j)))

现在,在groupbyapply上的ID + Document内调用此函数 -

df['NextDueDate'] = df.groupby(['ID', 'Document']).DueDate.apply(f)
df

   ID Document  Date DueDate NextDueDate
0   1      ABC  1/31     1/1        6/30
1   1      ABC  2/28     1/1        6/30
2   1      ABC  3/31     1/1        6/30
3   1      ABC  4/30    6/30        7/31
4   1      ABC  5/31    6/30        7/31
5   1      ABC  6/30    7/31        9/30
6   1      ABC  7/31    7/31        9/30
7   1      ABC  8/31    9/30       10/30