从字段中删除相似的字符

时间:2012-11-01 23:32:36

标签: r

我有File1.csv包含3000条记录,我需要从中删除与地址无关的字符。

每条记录都以“&”开头或“A / O”。 我需要清理我的“Address1”字段,如果该字段中没有地址相关信息, 我需要空记录。

示例:

File1.csv:

Address1
&&2340 Clemb Street
&&564 7th Street
&&&10th Street
A/O11th Street
A/ONorth Street
A/O/OSouth Street
A/Ocareof
A/Otttt
A/Oyuyuyu
A/Ouiuiuiuiui
A/O/yuyyuyuyuyugggh 4510th Street
&uhhhhhello 56 11th Street

我期待File1的结果 - 没有A / O,A / O / O,A / Ouiuiuiui等:

File1.csv:

Address1
2340 Clemb Street
564 7th Street
10th Street
11th Street
North Street
South Street
<blank record>
<blank record>
<blank record>
<blank record>
4510th Street
56 11th Street

Thanx寻求帮助!

1 个答案:

答案 0 :(得分:1)

您可以使用几乎可以肯定的更好的匹配模式,但gsub()以及以下似乎可以完成此数据集的工作:

x <- c('&&2340 Clemb Street',
       '&&564 7th Street',
       '&&&10th Street',
       'A/O11th Street',
       'A/ONorth Street',
       'A/O/OSouth Street')

gsub("&|A/O|/O", "", x)
#-----
[1] "2340 Clemb Street" "564 7th Street"    "10th Street"       "11th Street"      
[5] "North Street"      "South Street"  

可以找到正则表达式的简介here