我想找到至少2个不同(独特)日期的苹果或橙子。我想创建一个新的列,其中包含一个二元指示符,表明个人在至少两个日期是否有橙色或苹果(1 =是,0 =否)。
离我最近的是这个plyr代码。
df1<- ddply(df, .(names, fruit), mutate, acne = ifelse(fruit=="apple" | fruit=="orange" & length(unique(dates))>=2,1,0))
然而,这不是解决方案。安妮得到苹果两次,但在同一天,所以她不应该在这里得到1。同样,特德得到1,即使他只有一次苹果。
这更接近,但仍然不正确。它给出了两次发生的任何水果的1。需要每个人在两个人的日期每人两次出现水果
df2<- ddply(df, .(fruit), mutate, acne = ifelse(length(unique(dates))>=2, 1, 0
##this one gives a 1 to any fruit that has occurred twice. Need the fruit to occur twice per person on two individual dates per person.
如果有人能指出我正确的方向,我将非常感激。
提前谢谢
样本DF
names<-as.character(c("john", "john", "philip", "ted", "john", "john", "anne", "john", "mary","anne", "mary","mary","philip","mary", "su","mary", "jim", "sylvia", "mary", "ted","ted","mary", "sylvia", "jim", "ted", "john", "ted"))
dates<-as.Date(c("2010-07-01", "2010-07-13", "2010-05-12","2010-02-14","2010-06-30","2010-08-15", "2010-03-21","2010-04-04","2010-09-01", "2010-03-21", "2010-12-01", "2011-01-01", "2010-08-12", "2010-11-11", "2010-05-12", "2010-12-03", "2010-07-12", "2010-12-21", "2010-02-18", "2010-10-29", "2010-08-13", "2010-11-11", "2010-05-12", "2010-04-01", "2010-05-06", "2010-09-28", "2010-11-28" ))
fruit<-as.character(c("kiwi","apple","mango", "banana","strawberry","orange","apple","raspberry", "orange","apple","orange", "apple", "strawberry", "apple", "pineapple", "peach", "orange", "nectarine", "grape","banana", "melon", "apricot", "plum", "lychee", "mango", "watermelon", "apple" ))
df<-data.frame(names,dates,fruit)
df
所需的输出
names dates fruit v1
7 anne 2010-03-21 apple 0
10 anne 2010-03-21 apple 0
17 jim 2010-07-12 orange 0
24 jim 2010-04-01 lychee 0
1 john 2010-07-01 kiwi 1
2 john 2010-07-13 apple 1
5 john 2010-06-30 strawberry 1
6 john 2010-08-15 orange 1
8 john 2010-04-04 raspberry 1
26 john 2010-09-28 watermelon 1
9 mary 2010-09-01 orange 1
11 mary 2010-12-01 orange 1
12 mary 2011-01-01 apple 1
14 mary 2010-11-11 apple 1
16 mary 2010-12-03 peach 1
19 mary 2010-02-18 grape 1
22 mary 2010-11-11 apricot 1
3 philip 2010-05-12 mango 0
13 philip 2010-08-12 strawberry 0
15 su 2010-05-12 pineapple 0
18 sylvia 2010-12-21 nectarine 0
23 sylvia 2010-05-12 plum 0
4 ted 2010-02-14 banana 0
20 ted 2010-10-29 banana 0
21 ted 2010-08-13 melon 0
25 ted 2010-05-06 mango 0
27 ted 2010-11-28 apple 0
答案 0 :(得分:2)
这应该可以解决问题:
v1 = ave(1:nrow(df),df$names,FUN=function(x) length(unique(df$dates[x[df$fruit[x]
%in% c("orange","apple")]]))>1)
df$v1 = v1
df = df[order(df$names),]
答案 1 :(得分:2)
如果我理解正确,为了你的问题,苹果==橙子。所以计划是 (1)创建一个小数据框架,其中水果只是橙子或苹果,因为你不关心其他水果,(b)只选择唯一的日期/名称行,(c)按名称汇总和(d)合并返回原始data.frame以获得结果:
ndf <- subset(df, fruit %in% c("apple", "orange"))
ndf <- ndf[!duplicated(ndf[, c("names", "dates")]), ]
在这里你可以使用表格,但我更喜欢聚合
v <- aggregate(rep(1, nrow(ndf)), by = ndf[, "names", drop = FALSE], sum)
v$x <- ifelse(v$x > 1, 1, 0)
rv <- merge(df, v)
它比其他答案更长,代码性更强,但很明确,而且肯定会完成这项工作。 你可以在没有前两个部分的情况下使用聚合,但是如果你有庞大的data.frame,那么每个名字的聚合名称很多就会非常昂贵。
答案 2 :(得分:1)
我使用by
做了类似于@ amit的解决方案。 Rownames在do.call
期间遭到破坏,但您可以解决这个问题。
result <- by(df, INDICES = df$names, FUN = function(x) {
if (length(unique(x$dates)) == 1) {
x$index <- 0
return(x)
}
ao.sum <- sum(x$fruit %in% c("apple", "orange"))
if (ao.sum < 2) x$index <- 0 else x$index <- 1
x
})
do.call("rbind", result)
names dates fruit index
anne.7 anne 2010-03-21 apple 0
anne.10 anne 2010-03-21 apple 0
jim.17 jim 2010-07-12 orange 0
jim.24 jim 2010-04-01 lychee 0
john.1 john 2010-07-01 kiwi 1
john.2 john 2010-07-13 apple 1
john.5 john 2010-06-30 strawberry 1
john.6 john 2010-08-15 orange 1
john.8 john 2010-04-04 raspberry 1
john.26 john 2010-09-28 watermelon 1
mary.9 mary 2010-09-01 orange 1
mary.11 mary 2010-12-01 orange 1
mary.12 mary 2011-01-01 apple 1
mary.14 mary 2010-11-11 apple 1
mary.16 mary 2010-12-03 peach 1
mary.19 mary 2010-02-18 grape 1
mary.22 mary 2010-11-11 apricot 1
philip.3 philip 2010-05-12 mango 0
philip.13 philip 2010-08-12 strawberry 0
su su 2010-05-12 pineapple 0
sylvia.18 sylvia 2010-12-21 nectarine 0
sylvia.23 sylvia 2010-05-12 plum 0
ted.4 ted 2010-02-14 banana 0
ted.20 ted 2010-10-29 banana 0
ted.21 ted 2010-08-13 melon 0
ted.25 ted 2010-05-06 mango 0
ted.27 ted 2010-11-28 apple 0