获取最新记录,但如果最新记录为空,则获取最新记录

时间:2018-09-10 16:19:28

标签: r dplyr

我在尝试将行转换为列然后仅获取具有条目的最新记录(使用时间戳)时遇到问题。这是我的数据集:

enter image description here

df <- data.frame(id = c("123||wa", "123||wa", "123||wa", "223||sa", "223||sa", "223||sa", "123||wa"),
             questions = c("dish", "car", "house", "dish", "house", "car", "dish"),
             answers = c(" ", "bmw", "yes", "pizza", "yes", "audi","ravioli" ), 
             timestamp = c("24JUN2018:12:24:16", "07JAN2015:15:22:54", "24MAR2018:12:24:16", "24MAR2018:12:24:16",
                           "04AUG2014:12:40:30", "03JUL2014:15:38:11", "03JUL2014:15:38:11"))

所需的输出是:

enter image description here

到目前为止,我有下面的代码可以完成大部分工作。

library(lubridate)
library(dplyr)

df %>%
  mutate(timestamp = ymd_hms(strptime(timestamp, "%d%b%Y:%H:%M:%S"))) %>%
  group_by(id, questions) %>%
  arrange(timestamp) %>%
  summarise(last = last(answers)) %>%
  spread(questions, last)

唯一缺少的是即使它为空,它总是会提取最新记录。如何更改它,使其仅包括最新的条目记录,而不能空白。谢谢

以上代码产生的输出为:
enter image description here

2 个答案:

答案 0 :(得分:4)

使用slice的另一种方法。另外,您可以结合使用mutatearrange步骤:

library(lubridate)
library(dplyr)
library(tidyr)

df %>%
  group_by(id, questions) %>%
  arrange(timestamp = ymd_hms(strptime(timestamp, "%d%b%Y:%H:%M:%S"))) %>%
  slice(which.max(!is.na(answers))) %>%
  select(-timestamp) %>%
  spread(questions, answers)

输出:

# A tibble: 2 x 4
# Groups:   id [2]
  id      car   dish    house
  <fct>   <fct> <fct>   <fct>
1 123||wa bmw   ravioli yes  
2 223||sa audi  pizza   yes 

答案 1 :(得分:3)

我们可以filter开头的“答案”中" "的元素

df %>% 
  filter(answers != " ") %>%
  mutate(timestamp = ymd_hms(strptime(timestamp, "%d%b%Y:%H:%M:%S"))) %>%
  group_by(id, questions) %>%
  arrange(timestamp) %>%
  summarise(last = last(answers)) %>%
  spread(questions, last)
# A tibble: 2 x 4
# Groups:   id [2]
#  id      car   dish    house
#  <fct>   <fct> <fct>   <fct>
#1 123||wa bmw   ravioli yes  
#2 223||sa audi  pizza   yes