我在尝试将行转换为列然后仅获取具有条目的最新记录(使用时间戳)时遇到问题。这是我的数据集:
df <- data.frame(id = c("123||wa", "123||wa", "123||wa", "223||sa", "223||sa", "223||sa", "123||wa"),
questions = c("dish", "car", "house", "dish", "house", "car", "dish"),
answers = c(" ", "bmw", "yes", "pizza", "yes", "audi","ravioli" ),
timestamp = c("24JUN2018:12:24:16", "07JAN2015:15:22:54", "24MAR2018:12:24:16", "24MAR2018:12:24:16",
"04AUG2014:12:40:30", "03JUL2014:15:38:11", "03JUL2014:15:38:11"))
所需的输出是:
到目前为止,我有下面的代码可以完成大部分工作。
library(lubridate)
library(dplyr)
df %>%
mutate(timestamp = ymd_hms(strptime(timestamp, "%d%b%Y:%H:%M:%S"))) %>%
group_by(id, questions) %>%
arrange(timestamp) %>%
summarise(last = last(answers)) %>%
spread(questions, last)
唯一缺少的是即使它为空,它总是会提取最新记录。如何更改它,使其仅包括最新的条目记录,而不能空白。谢谢
答案 0 :(得分:4)
使用slice
的另一种方法。另外,您可以结合使用mutate
和arrange
步骤:
library(lubridate)
library(dplyr)
library(tidyr)
df %>%
group_by(id, questions) %>%
arrange(timestamp = ymd_hms(strptime(timestamp, "%d%b%Y:%H:%M:%S"))) %>%
slice(which.max(!is.na(answers))) %>%
select(-timestamp) %>%
spread(questions, answers)
输出:
# A tibble: 2 x 4
# Groups: id [2]
id car dish house
<fct> <fct> <fct> <fct>
1 123||wa bmw ravioli yes
2 223||sa audi pizza yes
答案 1 :(得分:3)
我们可以filter
开头的“答案”中" "
的元素
df %>%
filter(answers != " ") %>%
mutate(timestamp = ymd_hms(strptime(timestamp, "%d%b%Y:%H:%M:%S"))) %>%
group_by(id, questions) %>%
arrange(timestamp) %>%
summarise(last = last(answers)) %>%
spread(questions, last)
# A tibble: 2 x 4
# Groups: id [2]
# id car dish house
# <fct> <fct> <fct> <fct>
#1 123||wa bmw ravioli yes
#2 223||sa audi pizza yes