我们假设我有一个数据集:
Col1
Mon,Tues,Wed,Thurs,Fri
Mon,Tues,Wed,Thurs
Mon,Tues,Wed
Mon,Tues
Thurs
我想通过计算一组单词给每一行打分。 说我有这套话:星期一,星期二,星期三
如何制作具有相应分数的专栏?这将导致:
Scores
3
3
3
2
0
提前谢谢!
答案 0 :(得分:3)
以下是基础R解决方案:
words <- c("Mon", "Tues", "Wed");
sapply(strsplit(as.character(df$Col), ","), function(x) sum(x %in% words))
#[1] 3 3 3 2 0
或存储在Scores
列中:
df$Scores <- sapply(strsplit(as.character(df$Col), ","), function(x) sum(x %in% words));
df;
# Col1 Scores
#1 Mon,Tues,Wed,Thurs,Fri 3
#2 Mon,Tues,Wed,Thurs 3
#3 Mon,Tues,Wed 3
#4 Mon,Tues 2
#5 Thurs 0
或使用transform
和purrr::map_int
library(purrr);
transform(df, Scores = map_int(Col1, function(x)
sum(unlist(strsplit(as.character(x), ",")) %in% words)))
# Col1 Scores
#1 Mon,Tues,Wed,Thurs,Fri 3
#2 Mon,Tues,Wed,Thurs 3
#3 Mon,Tues,Wed 3
#4 Mon,Tues 2
#5 Thurs 0
df <- read.table(text =
"Col1
Mon,Tues,Wed,Thurs,Fri
Mon,Tues,Wed,Thurs
Mon,Tues,Wed
Mon,Tues
Thurs", header = T)
答案 1 :(得分:2)
我们可以str_count
paste
之后使用vector
&#39;
library(stringr)
df1$Scores <- str_count(df1$Col1, paste(words, collapse="|"))
df1$Scores
#[1] 3 3 3 2 0
或其他选项gregexpr
来自base R
res <- gregexpr(paste0(words, collapse="|"), df1$Col1)
df1$Scores <- lengths(res) * !sapply(res, function(x) -1 %in% x)
words <- c("Mon", "Tues", "Wed")
df1 <- structure(list(Col1 = c("Mon,Tues,Wed,Thurs,Fri", "Mon,Tues,Wed,Thurs",
"Mon,Tues,Wed", "Mon,Tues", "Thurs")), .Names = "Col1",
class = "data.frame", row.names = c(NA,
-5L))