从字符串向量中提取特定单词

时间:2014-05-10 13:29:53

标签: string r

我想提取单词" COUNTY"从以下字符串向量。我希望这可以扩展到不同的情况(上部和下部)以及可能出现的不同间距问题。我有以下向量:

COUNTY=c("LAWRENCE COUNTY", "SALT LAKE", "OCEAN COUNTY", "JASPER COUNTY", 
"PIMA", "JACKSON COUNTY", "PORTAGE COUNTY", "SEBASTIAN COUNTY", 
"ORANGE", "BERGEN COUNTY")

             COUNTY
1   LAWRENCE COUNTY
2         SALT LAKE
3      OCEAN COUNTY
4     JASPER COUNTY
5              PIMA
6    JACKSON COUNTY
7    PORTAGE COUNTY
8  SEBASTIAN COUNTY
9            ORANGE
10    BERGEN COUNTY

我希望这个矢量看起来像这样:

      COUNTY
1     LAWRENCE
2     SALT LAKE
3     OCEAN
4     JASPER
5     PIMA
6     JACKSON
7     PORTAGE
8     SEBASTIAN
9     ORANGE
10    BERGEN 

我基本上想要删除所说的" COUNTY"。

1 个答案:

答案 0 :(得分:2)

使用gsub,如果已知大小并且间距已知:

> gsub(' COUNTY', '', COUNTY, fixed = TRUE)
## [1] "LAWRENCE"  "SALT LAKE" "OCEAN"     "JASPER"    "PIMA"      "JACKSON"  
## [7] "PORTAGE"   "SEBASTIAN" "ORANGE"    "BERGEN"

案件未知:

> gsub(' county', '', COUNTY, ignore.case = TRUE)
## [1] "LAWRENCE"  "SALT LAKE" "OCEAN"     "JASPER"    "PIMA"      "JACKSON"  
## [7] "PORTAGE"   "SEBASTIAN" "ORANGE"    "BERGEN" 

间距和案例未知:

> gsub('\\s+(county)', '', COUNTY, ignore.case = TRUE)
## [1] "LAWRENCE"  "SALT LAKE" "OCEAN"     "JASPER"    "PIMA"      "JACKSON"  
## [7] "PORTAGE"   "SEBASTIAN" "ORANGE"    "BERGEN"

或者,可以使用strsplit

完成此操作
> unlist(strsplit(COUNTY, ' COUNTY'))
## [1] "LAWRENCE"  "SALT LAKE" "OCEAN"     "JASPER"    "PIMA"      "JACKSON"  
## [7] "PORTAGE"   "SEBASTIAN" "ORANGE"    "BERGEN"