匹配类似的字符/单词

时间:2017-06-04 05:29:14

标签: python r

我有以下数据框,其中包含X和Y列,

    X                                   Y
1   SAN DIEGO                           FOND DU LAC
2   THE RIO GRANDE                      RIO GRANDE
3   RIO GRANDE                          RIO GRANDE
4   WEST TENNESSEE                      TENNESSEE
5   EP De SAN JOAQUIN                   De SAN JOAQUIN
6   SOUTHERN VIRGINIA                   VIRGINIA
7   SOUTHERN VIRGINIA                   SOUTHWESTERN VIRGINIA
8   EN COLOMBIA                         COLOMBIA
9   THE EP De NORTHERN CALIFORNIA       De NORTHERN CALIFORNIA
10  FLORIDA                             NEW JERSY

我希望得到不匹配的行,1和10.第2-9行是匹配或匹配匹配,没关系。我期望的数据框是

    X                                   Y
1   SAN DIEGO                           FOND DU LAC
10  FLORIDA                             NEW JERSY

1 个答案:

答案 0 :(得分:0)

Foundation.reInit($('[data-sticky]')) 中,我们在每列中按空格分割字符串,检查字词之间是否有R,找到intersect的{​​{1}}并将其子集化数据集,其长度为0

lengths

不是按每列拆分,我们也可以循环遍历列,执行list

df1[!lengths(Map(intersect, strsplit(df1$X, "\\s+"), strsplit(df1$Y, "\\s+"))),]
#          X           Y
#1  SAN DIEGO FOND DU LAC
#10   FLORIDA   NEW JERSY

或另一个选项是split

df1[!lengths(do.call(Map, c(intersect, unname(lapply(df1, strsplit, split="\\s+"))))),]
#      X           Y
#1  SAN DIEGO FOND DU LAC
#10   FLORIDA   NEW JERSY