我有File1.csv包含3000条记录,我需要从中删除与地址无关的字符。
每条记录都以“&”开头或“A / O”。 我需要清理我的“Address1”字段,如果该字段中没有地址相关信息, 我需要空记录。
示例:
File1.csv:
Address1
&&2340 Clemb Street
&&564 7th Street
&&&10th Street
A/O11th Street
A/ONorth Street
A/O/OSouth Street
A/Ocareof
A/Otttt
A/Oyuyuyu
A/Ouiuiuiuiui
A/O/yuyyuyuyuyugggh 4510th Street
&uhhhhhello 56 11th Street
我期待File1的结果 - 没有A / O,A / O / O,A / Ouiuiuiui等:
File1.csv:
Address1
2340 Clemb Street
564 7th Street
10th Street
11th Street
North Street
South Street
<blank record>
<blank record>
<blank record>
<blank record>
4510th Street
56 11th Street
Thanx寻求帮助!
答案 0 :(得分:1)
您可以使用几乎可以肯定的更好的匹配模式,但gsub()
以及以下似乎可以完成此数据集的工作:
x <- c('&&2340 Clemb Street',
'&&564 7th Street',
'&&&10th Street',
'A/O11th Street',
'A/ONorth Street',
'A/O/OSouth Street')
gsub("&|A/O|/O", "", x)
#-----
[1] "2340 Clemb Street" "564 7th Street" "10th Street" "11th Street"
[5] "North Street" "South Street"
可以找到正则表达式的简介here。