我正在使用R和Dplyr以及一个包含日期/时间信息列的数据集,一列包含电话号码,另一列包含两个选项,鸡蛋和奶酪。
Date Phone.Number Eggs.or.Cheese
1 14/09/15 1111111111 EGGS
2 14/09/15 2222222222 EGGS
3 14/09/15 3333333333 EGGS
4 15/09/15 4444444444 EGGS
5 15/09/15 5555555555 EGGS
6 16/09/15 1111111111 CHEESE
7 16/09/15 6666666666 EGGS
8 16/09/15 7777777777 EGGS
(此处的输入信息):
structure(list(Date = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 3L,
3L), .Label = c("14/09/15", "15/09/15", "16/09/15"), class = "factor"),
Phone.Number = c(1111111111, 2222222222, 3333333333, 4444444444,
5555555555, 1111111111, 6666666666, 7777777777), Eggs.or.Cheese = structure(c(2L,
2L, 2L, 2L, 2L, 1L, 2L, 2L), .Label = c("CHEESE", "EGGS"), class = "factor")), .Names = c("Date",
"Phone.Number", "Eggs.or.Cheese"), class = "data.frame", row.names = c(NA,
-8L))
我正在尝试创建一个子集,其中包含过去表示过鸡蛋的所有电话号码,然后称为奶酪。这个子集将包括这些电话号码的每个观察结果,看起来如下所示。
Date Phone.Number Eggs.or.Cheese
1 14/09/15 1111111111 EGGS
2 16/09/15 1111111111 CHEESE
我一直在玩过滤器,但我不确定如何在命令中使用日期和时间信息
另外,我还是R,编码和stackfoverflow的新手,所以对我如何提问的任何反馈都将不胜感激。
答案 0 :(得分:3)
以下是使用data.table
的尝试。
首先,我们会将Date
转换为适当的类,以便我们可以对其进行排序,然后检查每部手机的唯一组合,看看它们是否匹配"EGGS, CHEESE"
,然后打印整个组
library(data.table)
setDT(DT)[, Date := as.IDate(Date, "%d/%m/%y")]
DT[order(Date), if(toString(unique(Eggs.or.Cheese)) == "EGGS, CHEESE") .SD, by = Phone.Number]
# Phone.Number Date Eggs.or.Cheese
# 1: 1111111111 2015-09-14 EGGS
# 2: 1111111111 2015-09-16 CHEESE
等效的dplyr
library(dplyr)
DT %>%
mutate(Date = as.Date(Date, "%d/%m/%y")) %>%
arrange(Date) %>% ## This is optional if your data is already sorted
group_by(Phone.Number) %>%
filter(toString(unique(Eggs.or.Cheese)) == "EGGS, CHEESE")
# Source: local data frame [2 x 3]
# Groups: Phone.Number [1]
#
# Date Phone.Number Eggs.or.Cheese
# (date) (dbl) (fctr)
# 1 2015-09-14 1111111111 EGGS
# 2 2015-09-16 1111111111 CHEESE