将列名称与r中的非零列值连接

时间:2019-09-19 15:57:48

标签: r

我有一个数据集,名称包括得分1,得分2,得分3,得分4,得分5,得分6列。我想通过串联列名和非零列值来创建新列“规则”。

name    score1  score2  score3  score4  score5  score6  rule
name1   0       0       0       0       0       0        NA
name2   0       1       0       0       0       0        score2:1
name3   0       1       1       0       1       0        score2:1,score3:1,score5:1
name4   1       1       1       1       1       1        score1:1,score2:1,score3:1,score4:1,score5:1,score6:1

我为串联编写了以下代码,但无法排除零值的列名。

cols <- colnames(data)[-1]
data <- data[, rules := do.call(paste, c(lapply(cols, function(x) paste(x, get(x), sep=":")),              
                                          sep=","))]

任何帮助将不胜感激。 TIA。

1 个答案:

答案 0 :(得分:1)

看起来您正在使用data.table,但是我不知道有什么方法可以通过引用来执行此操作。我不确定您的数据是否太大,是否需要根据参考来完成。

这是一个相当整洁的方法:

df <- read.table(text = "
name    score1 score2 score3 score4 score5 score6
name1   0       0       0       0       0       0
name2   0       1       0       0       0       0
name3   0       1       1       0       1       0
name4   1       1       1       1       1       1",
header = TRUE,
stringsAsFactors = FALSE
)

library(tidyr)
library(dplyr)

rule <- df %>%
  # gather data to long format with one row for each column/value pair
  gather(key = "colname",value = "value",score1:score6) %>%
  # remove zeroes for rule generation
  filter(value != 0) %>%
  # For each single value join the name and quantity
  mutate(value = paste(colname,value,sep = ": ")) %>%
  # For each name pull things together
  group_by(name) %>%
  # Collapse the results for that name into a single character vector
  summarise(value = paste(value,collapse = ", "))

# Join the rules back to the dataframe
df <- df %>%
  left_join(rule)

结果:

> df
   name score1 score2 score3 score4 score5 score6
1 name1      0      0      0      0      0      0
2 name2      0      1      0      0      0      0
3 name3      0      1      1      0      1      0
4 name4      1      1      1      1      1      1
                                                             value
1                                                             <NA>
2                                                        score2: 1
3                                  score2: 1, score3: 1, score5: 1
4 score1: 1, score2: 1, score3: 1, score4: 1, score5: 1, score6: 1