我有一个看起来像这样的数据框,我想比较book_id1和book_id2并计算"之间的字符串数量。 "并以逗号分隔
id1 id2 book_id1 numberofbook_id1 book_id2 numberofbook_id2
1 2 ["19167120","237494310","195166798"] 3 ["19167120","237494310"] 2
1 3 ["19167120","237494310","195166798"] 3 [] 0
2 3 ["19167120","237494310"] 2 [] 0
我想要的输出是这样的:
id1 id2 book_id1 numberofbook_id1 book_id2 numberofbook_id2 count
1 2 ["19167120","237494310","195166798"] 3 ["19167120","237494310"] 2 2
1 3 ["19167120","237494310","195166798"] 3 [] 0 0
2 3 ["19167120","237494310"] 2 [] 0 0
提前谢谢
答案 0 :(得分:0)
如果您想获得匹配字符串的数量
library(stringr)
count <- sapply(Map(intersect,str_extract_all(df$book_id1, '\\d+'),
str_extract_all(df$book_id2, '\\d+')), length)
count
#[1] 2 0 0
transform(df, count=count)
或者,如果您只需要计数,
nchar(gsub('[^,]+', '',df$book_id1))+1
#[1] 3 3 2
count <- nchar(gsub('[^,]+', '',df$book_id2))
transform(df, count= ifelse(count==1, count+1, 0))
# id1 id2 book_id1 numberofbook_id1
#1 1 2 ["19167120","237494310","195166798"] 3
#2 1 3 ["19167120","237494310","195166798"] 3
#3 2 3 ["19167120","237494310"] 2
# book_id2 numberofbook_id2 count
#1 ["19167120","237494310"] 2 2
#2 [] 0 0
#3 [] 0 0
df <- structure(list(id1 = c(1L, 1L, 2L), id2 = c(2L, 3L, 3L), book_id1 =
c("[\"19167120\",\"237494310\",\"195166798\"]",
"[\"19167120\",\"237494310\",\"195166798\"]", "[\"19167120\",\"237494310\"]"
), numberofbook_id1 = c(3L, 3L, 2L), book_id2 = c("[\"19167120\",\"237494310\"]",
"[]", "[]"), numberofbook_id2 = c(2L, 0L, 0L)), .Names = c("id1",
"id2", "book_id1", "numberofbook_id1", "book_id2", "numberofbook_id2"
), class = "data.frame", row.names = c(NA, -3L))