如何从字频数据帧生成字符串

时间:2016-01-28 17:04:55

标签: r dataframe word-frequency

假设我有以下包含单词频率的数据框:

      Bob   Joe Go  Eat Run
doc1    2   0   0   1   2 
doc2    0   1   1   2   0

我需要生成一个char矢量,如下所示:

chr[1:2] "Bob Bob Eat Run Run"
         "Joe Go Eat Eat"

2 个答案:

答案 0 :(得分:2)

您可以尝试以下操作:

df <- data.frame(Bob = c(2, 0), Joe = c(0, 1), Go = c(0, 1), Eat = c(1, 2), Run = c(2, 0))
row.names(df) <- c('doc1', 'doc2')
df
     Bob Joe Go Eat Run
doc1   2   0  0   1   2
doc2   0   1  1   2   0

apply(df, 1, function(x) paste(rep(names(df), x), collapse = ' '))
                 doc1                  doc2 
"Bob Bob Eat Run Run"      "Joe Go Eat Eat" 

如果你不喜欢这个名字&#39;像上面的矢量,并想要一个直的字符向量,你可以这样做:

as.character(apply(df, 1, function(x) paste(rep(names(df), x), collapse = ' ')))
[1] "Bob Bob Eat Run Run" "Joe Go Eat Eat"    

答案 1 :(得分:1)

以下是使用data.table的选项。转换&#39; data.frame&#39; to&#39; data.table&#39;,按行序列unlist分组,按照它复制df的列名, 然后paste它在一起。

library(data.table)
setDT(df)[, toString(rep(names(df), unlist(.SD))) ,1:nrow(df)]$V1
#[1] "Bob, Bob, Eat, Run, Run" "Joe, Go, Eat, Eat"    

或使用tapply

中的base R
tapply(unlist(df), row(df), FUN= function(x) 
                     toString(rep(names(df), x)))