我有一个包含20,000行的数据集,其最纯粹的形式如下所示:
v1 v2
1 Case 1 (A v. B) A v. B
2 Case 2 (A v. C) A v. B
3 Case 2 (A v. C) C v. B
4 Case 4 (X v. Z) X v. Z
5 Case 5 (B v. A) A v. B
6 Case 6 (X v. A) X v. A
7 Case 6 (X v. A) A v. X
...
...除了 v1,v2 的n个变种(实际上约为150左右,但仍然太多而不能列出)。
我想返回第三列 v3 ,其中包含 v1 的任何子字符串是否与 v2 中的字符串匹配的逻辑指示符。
v1 v2 v3
1 Case 1 (A v. B) A v. B TRUE
2 Case 2 (A v. C) A v. B FALSE
3 Case 2 (A v. C) C v. B FALSE
4 Case 4 (X v. Z) X v. Z TRUE
5 Case 5 (B v. A) A v. B FALSE
6 Case 6 (X v. A) X v. A TRUE
7 Case 6 (X v. A) A v. X FALSE
我一直在玩这样的东西,我认为这是在正确的轨道上:
library(stringr)
x$v3 <- with(x, str_detect(v1, v2))
如果有人能指出我正确的解决方案/解决方法,我将非常感激。
MWE显示我的str_detect()技术不起作用:
x <- structure(list(v1 = c("Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation"
), v2 = c("Georgia v Russian Federation", " Ethiopia v South Africa Liberia v South Africa",
" Cameroon v United Kingdom", " New Zealand v France", " Australia v France",
" Nicaragua v United States of America", " Nicaragua v Honduras",
" Nauru v Anustralia", " Nnew Zealand v France", " Islamic Republic of Iran v United States of America",
" Bosnia and Herzegovina v Serbia and Montenegro", " Spain v Cananda",
" Libyan Arab Jamahiriya v United States of America", " Libyan Arab Jamahiriya v United Kingdom",
" Democratic Republic of the Congo v Burundi", " Germany v United States of America",
" Democratic Republic of the Congo v Belgium", " Liechtenstein v Germany",
" Democratic Republic of the Congo v Ugandan", " Democratic Republic of the Congo v Rwandan",
" Nicaragua v Colombia", " Djibouti v France", " Georgia v Russian Federation",
" Croatia v Serbia", " Mexico v United States of American", " Democratic Republic of the Congo v Rwanda",
" Spain v Canada", " Australia v France", " New Zealand v France",
" New Zealand v France")), .Names = c("v1", "v2"
), row.names = c(NA, 30L), class = "data.frame")
答案 0 :(得分:1)
grepl
可用于将v2中的单个值与v1的可能子串进行比较
您需要分别为每一行应用它,因此快速解决方案可以是:
apply(data.frame(v1,v2),MARGIN=1, FUN=function(x) {grepl(x[2],x[1])})
如果你想忽略空格数的差异(比如第1行),你可以使用gsub将x [2]中的值替换为相应的正则表达式,这样" "
将被替换为" *"
允许多个空格。
在这种情况下,此申请将起作用:
apply(x,MARGIN=1, FUN=function(x) {grepl(gsub(" "," *",x[2]),x[1])})