通过对变量进行分组来折叠列(在基础中)

时间:2012-03-25 03:49:18

标签: r

我有一个文本变量和一个分组变量。我想将text变量按行数折叠成每行一个字符串(合并)。因此,只要组列显示m我想将文本组合在一起,依此类推。我提供了之前和之后的样本数据集。我正在写这个包,并且到目前为止避免了对wordcloud以外的其他包的依赖,并希望保持这种方式。

我怀疑rle可能对cumsum有用,但无法解决这个问题。

提前谢谢。

数据是什么样的

                                 text group
1       Computer is fun. Not too fun.     m
2               No its not, its dumb.     m
3              How can we be certain?     f
4                    There is no way.     m
5                     I distrust you.     m
6         What are you talking about?     f
7       Shall we move on?  Good then.     f
8 Im hungry.  Lets eat.  You already?     m

我想要的数据是什么

                                                       text group
1       Computer is fun. Not too fun. No its not, its dumb.     m
2                                    How can we be certain?     f
3                          There is no way. I distrust you.     m
4 What are you talking about? Shall we move on?  Good then.     f
5                       Im hungry.  Lets eat.  You already?     m

数据

dat <- structure(list(text = c("Computer is fun. Not too fun.", "No its not, its dumb.", 
"How can we be certain?", "There is no way.", "I distrust you.", 
"What are you talking about?", "Shall we move on?  Good then.", 
"Im hungry.  Lets eat.  You already?"), group = structure(c(2L, 
2L, 1L, 2L, 2L, 1L, 1L, 2L), .Label = c("f", "m"), class = "factor")), .Names = c("text", 
"group"), row.names = c(NA, 8L), class = "data.frame")

编辑:我发现我可以为每个组变量运行添加唯一列:

x <- rle(as.character(dat$group))[[1]]
dat$new <- as.factor(rep(1:length(x), x))

产量:

                                 text group new
1       Computer is fun. Not too fun.     m   1
2               No its not, its dumb.     m   1
3              How can we be certain?     f   2
4                    There is no way.     m   3
5                     I distrust you.     m   3
6         What are you talking about?     f   4
7       Shall we move on?  Good then.     f   4
8 Im hungry.  Lets eat.  You already?     m   5

2 个答案:

答案 0 :(得分:5)

这使用rle创建一个id来对句子进行分组。它使用tapply和paste一起输出

## Your example data
dat <- structure(list(text = c("Computer is fun. Not too fun.", "No its not, its dumb.", 
"How can we be certain?", "There is no way.", "I distrust you.", 
"What are you talking about?", "Shall we move on?  Good then.", 
"Im hungry.  Lets eat.  You already?"), group = structure(c(2L, 
2L, 1L, 2L, 2L, 1L, 1L, 2L), .Label = c("f", "m"), class = "factor")), .Names = c("text", 
"group"), row.names = c(NA, 8L), class = "data.frame")


# Needed for later
k <- rle(as.numeric(dat$group))
# Create a grouping vector
id <- rep(seq_along(k$len), k$len)
# Combine the text in the desired manner
out <- tapply(dat$text, id, paste, collapse = " ")
# Bring it together into a data frame
answer <- data.frame(text = out, group = levels(dat$group)[k$val])

答案 1 :(得分:1)

我得到了答案并回来发帖但是Dason打败了我,比我自己更容易理解。

x <- rle(as.character(dat$group))[[1]]
dat$new <- as.factor(rep(1:length(x), x))

Paste <- function(x) paste(x, collapse=" ")
aggregate(text~new, dat, Paste)

修改 如何使用聚合以及从响应中学到的内容(尽管tapply是更好的解决方案):

y <- rle(as.character(dat$group))
x <- y[[1]]
dat$new <- as.factor(rep(1:length(x), x))

text <- aggregate(text~new, dat, paste, collapse = " ")[, 2]
data.frame(text, group = y[[2]])