我有一个数据集,名称包括得分1,得分2,得分3,得分4,得分5,得分6列。我想通过串联列名和非零列值来创建新列“规则”。
name score1 score2 score3 score4 score5 score6 rule
name1 0 0 0 0 0 0 NA
name2 0 1 0 0 0 0 score2:1
name3 0 1 1 0 1 0 score2:1,score3:1,score5:1
name4 1 1 1 1 1 1 score1:1,score2:1,score3:1,score4:1,score5:1,score6:1
我为串联编写了以下代码,但无法排除零值的列名。
cols <- colnames(data)[-1]
data <- data[, rules := do.call(paste, c(lapply(cols, function(x) paste(x, get(x), sep=":")),
sep=","))]
任何帮助将不胜感激。 TIA。
答案 0 :(得分:1)
看起来您正在使用data.table,但是我不知道有什么方法可以通过引用来执行此操作。我不确定您的数据是否太大,是否需要根据参考来完成。
这是一个相当整洁的方法:
df <- read.table(text = "
name score1 score2 score3 score4 score5 score6
name1 0 0 0 0 0 0
name2 0 1 0 0 0 0
name3 0 1 1 0 1 0
name4 1 1 1 1 1 1",
header = TRUE,
stringsAsFactors = FALSE
)
library(tidyr)
library(dplyr)
rule <- df %>%
# gather data to long format with one row for each column/value pair
gather(key = "colname",value = "value",score1:score6) %>%
# remove zeroes for rule generation
filter(value != 0) %>%
# For each single value join the name and quantity
mutate(value = paste(colname,value,sep = ": ")) %>%
# For each name pull things together
group_by(name) %>%
# Collapse the results for that name into a single character vector
summarise(value = paste(value,collapse = ", "))
# Join the rules back to the dataframe
df <- df %>%
left_join(rule)
结果:
> df
name score1 score2 score3 score4 score5 score6
1 name1 0 0 0 0 0 0
2 name2 0 1 0 0 0 0
3 name3 0 1 1 0 1 0
4 name4 1 1 1 1 1 1
value
1 <NA>
2 score2: 1
3 score2: 1, score3: 1, score5: 1
4 score1: 1, score2: 1, score3: 1, score4: 1, score5: 1, score6: 1