使用dplyr返回两列中第一个字符匹配且两行不匹配的行

时间:2016-08-17 17:43:03

标签: r dplyr data-manipulation

我有以下数据框:

df <- structure(list(traffic_Count_Street = c("16th St", "17th St", 
                                        "Agnes St", "Ayers St", "Ayers St", "Ayers St", "Ayers St", "Baldwin Blvd", 
                                        "Baldwin Blvd", "Baldwin Blvd","S Brahma Blvd"), 
                     unit_Street = c("Baldwin Blvd", "Baldwin Blvd", "Baldwin Blvd", "Baldwin Blvd", "Baldwin Blvd", 
                      "Baldwin Blvd", "Baldwin Blvd", "Baldwin Blvd", "Baldwin Blvd", 
                     "Baldwin Blvd","S 14th St")), .Names = c("traffic_Count_Street", "unit_Street"
                      ), row.names = c(NA, 11L), class = "data.frame")

   traffic_Count_Street  unit_Street
1               16th St Baldwin Blvd
2               17th St Baldwin Blvd
3              Agnes St Baldwin Blvd
4              Ayers St Baldwin Blvd
5              Ayers St Baldwin Blvd
6              Ayers St Baldwin Blvd
7              Ayers St Baldwin Blvd
8          Baldwin Blvd Baldwin Blvd
9          Baldwin Blvd Baldwin Blvd
10         Baldwin Blvd Baldwin Blvd
11        S Brahma Blvd    S 14th St

我希望返回两行中每行不匹配的行,或者只是每列的第一个字符匹配

结果如下:

  traffic_Count_Street unit_Street
1        S Brahma Blvd   S 14th St

我有以下但我不确定它是否正确。

require(dplyr)
result = df%>% 
  filter(traffic_Count_Street != unit_Street & traffic_Count_Street[1] == unit_Street[1])

2 个答案:

答案 0 :(得分:2)

我们可以使用substr提取每列的第一个字符,比较(==)和filter行以及OP代码中的其他比较。

df %>% 
    filter(substr(traffic_Count_Street, 1, 1) == substr(unit_Street, 1, 1) & 
            traffic_Count_Street != unit_Street)
#  traffic_Count_Street unit_Street
#1        S Brahma Blvd   S 14th St

或使用data.table

setDT(df)[df[,Reduce(`!=`, .SD) & substr(.SD[[1]],1,1) == substr(.SD[[2]], 1, 1)]]
#   traffic_Count_Street unit_Street
#1:        S Brahma Blvd   S 14th St

或使用base R

subset(df, substr(traffic_Count_Street, 1, 1) == substr(unit_Street, 1, 1) &              
            traffic_Count_Street != unit_Street)

答案 1 :(得分:2)

使用data.table fot其糖语法:

library(data.table)
setDT(dat)[substr(traffic_Count_Street, 1, 1) == substr(unit_Street, 1, 1) & 
      traffic_Count_Street != unit_Street]

#    traffic_Count_Street unit_Street
# 1:        S Brahma Blvd   S 14th St