假设我有这个输入:
ID date_1 date_2 str
1 1 2010-07-04 2008-01-20 A
2 2 2015-07-01 2011-08-31 C
3 3 2015-03-06 2013-01-18 D
4 4 2013-01-10 2011-08-30 D
5 5 2014-06-04 2011-09-18 B
6 5 2014-06-04 2011-09-18 B
7 6 2012-11-22 2011-09-28 C
8 7 2014-06-17 2013-08-04 A
10 7 2014-06-17 2013-08-04 B
11 7 2014-06-17 2013-08-04 B
我想逐步将str
列的值与组变量ID
连接起来,如以下输出所示:
ID date_1 date_2 str
1 1 2010-07-04 2008-01-20 A
2 2 2015-07-01 2011-08-31 C
3 3 2015-03-06 2013-01-18 D
4 4 2013-01-10 2011-08-30 D
5 5 2014-06-04 2011-09-18 B
6 5 2014-06-04 2011-09-18 B,B
7 6 2012-11-22 2011-09-28 C
8 7 2014-06-17 2013-08-04 A
10 7 2014-06-17 2013-08-04 A,B
11 7 2014-06-17 2013-08-04 A,B,B
我尝试将ave()
函数与此代码一起使用:
within(table, {
Emp_list <- ave(str, ID, FUN = function(x) paste(x, collapse = ","))
})
但它提供了以下输出,这不是我想要的:
ID date_1 date_2 str
1 1 2010-07-04 2008-01-20 A
2 2 2015-07-01 2011-08-31 C
3 3 2015-03-06 2013-01-18 D
4 4 2013-01-10 2011-08-30 D
5 5 2014-06-04 2011-09-18 B,B
6 5 2014-06-04 2011-09-18 B,B
7 6 2012-11-22 2011-09-28 C
8 7 2014-06-17 2013-08-04 A,B,B
10 7 2014-06-17 2013-08-04 A,B,B
11 7 2014-06-17 2013-08-04 A,B,B
当然,我想避免循环,因为我在大型数据库上工作。
答案 0 :(得分:9)
NOTE: The camera API only works on a real device, and not in the emulator.
和ave()
怎么样? Reduce()
函数允许我们在计算结果时累积结果。因此,如果我们使用Reduce()
运行它,我们就可以累积粘贴的字符串。
paste()
提供更新的数据框f <- function(x) {
Reduce(function(...) paste(..., sep = ", "), x, accumulate = TRUE)
}
df$str <- with(df, ave(as.character(str), ID, FUN = f)
df
注意: ID date_1 date_2 str
1 1 2010-07-04 2008-01-20 A
2 2 2015-07-01 2011-08-31 C
3 3 2015-03-06 2013-01-18 D
4 4 2013-01-10 2011-08-30 D
5 5 2014-06-04 2011-09-18 B
6 5 2014-06-04 2011-09-18 B, B
7 6 2012-11-22 2011-09-28 C
8 7 2014-06-17 2013-08-04 A
10 7 2014-06-17 2013-08-04 A, B
11 7 2014-06-17 2013-08-04 A, B, B
也可能是function(...) paste(..., sep = ", ")
。 (感谢Pierre Lafortune)
答案 1 :(得分:8)
这里有一个可能的解决方案,将data.table
与内部tapply
相结合,似乎可以满足您的需求(如果您使用paste
代替toString
比如,它对我来说只是看起来更清洁。)
library(data.table)
setDT(df)[, Str := tapply(str[sequence(1:.N)], rep(1:.N, 1:.N), toString), by = ID]
df
# ID date_1 date_2 str Str
# 1: 1 2010-07-04 2008-01-20 A A
# 2: 2 2015-07-01 2011-08-31 C C
# 3: 3 2015-03-06 2013-01-18 D D
# 4: 4 2013-01-10 2011-08-30 D D
# 5: 5 2014-06-04 2011-09-18 B B
# 6: 5 2014-06-04 2011-09-18 B B, B
# 7: 6 2012-11-22 2011-09-28 C C
# 8: 7 2014-06-17 2013-08-04 A A
# 9: 7 2014-06-17 2013-08-04 B A, B
# 10: 7 2014-06-17 2013-08-04 B A, B, B
您可以使用
进行一些改进setDT(df)[, Str := {Len <- 1:.N ; tapply(str[sequence(Len)], rep(Len, Len), toString)}, by = ID]