我想确定在规定的时间范围内获得苹果的独特人物。我通过如下创建二进制指示符“apples”来做到这一点。
names<-c("tom", "mary", "tom", "john", "mary", "tom", "john", "mary", "john", "mary", "tom", "mary", "john", "john")
dates<-as.Date(c("2010-02-01", "2010-05-01", "2010-03-01", "2010-07-01", "2010-07-01", "2010-06-01", "2010-09-01", "2010-07-01", "2010-11-01", "2010-09-01", "2010-08-01", "2010-11-01", "2010-12-01", "2011-01-01"))
fruit<-as.character(c("apple", "orange", "banana", "kiwi", "apple", "apple", "apple", "orange", "banana", "apple", "kiwi", "apple", "orange", "apple"))
age<-as.numeric(c(60,55,60,57,55,60,57,55,57,55,60,55, 57,57))
sex<-as.character(c("m","f","m","m","f","m","m", "f","m","f","m","f","m", "m"))
df<-data.frame(names,dates, age, sex, fruit)
df
df$apples<-ifelse(df$fruit=='apple' & df$dates>="2010-04-01" & df$dates<"2010-10-01",1,0)
df
names dates age sex fruit apples
1 tom 2010-02-01 60 m apple 0
2 mary 2010-05-01 55 f orange 0
3 tom 2010-03-01 60 m banana 0
4 john 2010-07-01 57 m kiwi 0
5 mary 2010-07-01 55 f apple 1
6 tom 2010-06-01 60 m apple 1
7 john 2010-09-01 57 m apple 1
8 mary 2010-07-01 55 f orange 0
9 john 2010-11-01 57 m banana 0
10 mary 2010-09-01 55 f apple 1
11 tom 2010-08-01 60 m kiwi 0
12 mary 2010-11-01 55 f apple 0
13 john 2010-12-01 57 m orange 0
14 john 2011-01-01 57 m apple 0
我的问题是玛丽在那里两次。我只想要在指定的时间范围内获得苹果的第一个日期(并且每个人都会在真实数据中首次约会)。我想要一个名为“apples1”的第二列,它在定义的时间范围内标记每个人的初始日期,他们得到了一个苹果。
期望的输出:
names dates age sex fruit apples apples1
1 tom 2010-02-01 60 m apple 0 0
2 mary 2010-05-01 55 f orange 0 0
3 tom 2010-03-01 60 m banana 0 0
4 john 2010-07-01 57 m kiwi 0 0
5 mary 2010-07-01 55 f apple 1 1
6 tom 2010-06-01 60 m apple 1 1
7 john 2010-09-01 57 m apple 1 1
8 mary 2010-07-01 55 f orange 0 0
9 john 2010-11-01 57 m banana 0 0
10 mary 2010-09-01 55 f apple 1 0
11 tom 2010-08-01 60 m kiwi 0 0
12 mary 2010-11-01 55 f apple 0 0
13 john 2010-12-01 57 m orange 0 0
14 john 2011-01-01 57 m apple 0 0
我一直在寻找,最接近的是这个 - Select only the first rows for each unique value of a column in R。但这并不能解决独特的问题。我也遇到了!重复,但我不想删除玛丽的数据,因为我需要她的约会以继续跟进她。我可能在这里遗漏了一些非常重要的事情,提前道歉。
答案 0 :(得分:1)
library(plyr)
df <- df[order(df$dates), ]
ddply(df, "names", transform,
apple1 = as.numeric(!duplicated(fruit) & fruit == "apple")
)
注意:我假设ddply在按分割变量分割时保留数据帧的排序。根据我的经验,您可以通过将transform
更改为执行排序子句的内联函数来稍微修改此解决方案,我认为这不是必需的。
答案 1 :(得分:1)
这是一个data.table
解决方案。我在同一时间创建了2列。
DT <- data.table(df)
setkeyv(DT,c("names","dates"))
DT[ fruit == "apple" &
dates >= "2010-04-01" &
dates < "2010-10-01",
`:=`(c('apples','apples1') ,
list(1,
{ifelse(!duplicated(names),1,0)}))
]
names dates age sex fruit apples apples1
1: john 2010-07-01 57 m kiwi NA NA
2: john 2010-09-01 57 m apple 1 1
3: john 2010-11-01 57 m banana NA NA
4: john 2010-12-01 57 m orange NA NA
5: john 2011-01-01 57 m apple NA NA
6: mary 2010-05-01 55 f orange NA NA
7: mary 2010-07-01 55 f apple 1 1
8: mary 2010-07-01 55 f orange NA NA
9: mary 2010-09-01 55 f apple 1 0
10: mary 2010-11-01 55 f apple NA NA
11: tom 2010-02-01 60 m apple NA NA
12: tom 2010-03-01 60 m banana NA NA
13: tom 2010-06-01 60 m apple 1 1
14: tom 2010-08-01 60 m kiwi NA NA