Question

我有一个像这样的数据表

ID      DAYS    FREQUENCY
"ads"   20      3
"jwa"   45      2
"mno"   4       1
"ads"   13      3
"jwa"   60      2
"ads"   18      3

我想添加一个列，根据id减去天数，并将最近的天数减去一起。我的新表是这样的：

ID      DAYS    FREQUENCY    DAYS DIFF
"ads"   20      3            2 (because 20-18) 
"jwa"   45      2            NA (because no value greater than 45 for that id)
"mno"   4       1            NA
"ads"   13      3            NA
"jwa"   60      2            15
"ads"   18      3            5

奖励：有没有办法使用合并功能？

Answer 1

以下是使用dplyr的答案：

require(dplyr)
mydata %>%
  mutate(row.order = row_number()) %>% # row numbers added to preserve original row order
  group_by(ID) %>%
  arrange(DAYS) %>%
  mutate(lag = lag(DAYS)) %>%
  mutate(days.diff = DAYS - lag) %>%
  ungroup() %>%
  arrange(row.order) %>%
  select(ID, DAYS, FREQUENCY, days.diff)

输出：

      ID  DAYS FREQUENCY days.diff
  <fctr> <int>     <int>     <int>
1    ads    20         3         2
2    jwa    45         2        NA
3    mno     4         1        NA
4    ads    13         3        NA
5    jwa    60         2        15
6    ads    18         3         5

Answer 2

您可以使用dplyr和快速循环执行此操作：

library(dplyr)

# Rowwise data.frame creation because I'm too lazy not to copy-paste the example data
df <- tibble::frame_data(
  ~ID,    ~DAYS,  ~FREQUENCY,
  "ads",   20,      3,
  "jwa",   45,      2,
  "mno",   4,       1,
  "ads",   13,      3,
  "jwa",   60,      2,
  "ads",   18,      3
)

# Subtract each number in a numeric vector with the one following it
rolling_subtraction <- function(x) {
  out <- vector('numeric', length(x))
  for (i in seq_along(out)) {
    out[[i]] <- x[i] - x[i + 1] # x[i + 1] is NA if the index is out of bounds
  }

  out
}

# Arrange data.frame in order of ID / Days and apply rolling subtraction
df %>% 
  arrange(ID, desc(DAYS)) %>% 
  group_by(ID) %>% 
  mutate(days_diff = rolling_subtraction(DAYS))

通过根据R中的键减去值来创建新列？

2 个答案: