我有一个句子列表:
[ 'home twn cafe nr link rd',
'taj lands ends hotel..',
'SILVER PALACE705BPALI MALA ROADBANDRA WEST',
'turner rd lemon rd 4 fountain pali rd junctio...',
' FLAT 657 FLOOR AIR INDIA APTS 61B PALI HILL',
'bungalow 9 Mt Mary Bandra West',
'shabbir apt charklie rajan rd abv icici ban...',
'st peters church backyard loun hill rd',
'Union Park Road ',
'Flat 32 Building No 8',
'mehboob studio',
'ONGC Colony',
'Nargis Dutt Road Grand Canyon Building Appa']
我需要使用re.findall查找带有'rd'的所有单词,并将其替换为'road'。我试过这个:
data2 = [nltk.sent_tokenize(lines) for lines in data]
c = [re.findall('nr',sent) for sent in data2]
我收到了这个错误:
TypeError:期望的字符串或缓冲区
如何在迭代语句中使用re.findall
? dunno如何转换为字符串.. plz help
答案 0 :(得分:2)
我会使用像这样的简单RegEx和列表理解
import re
pattern = re.compile(r"\brd\b")
print [pattern.sub("road", line) for line in data]
<强>输出强>
['home twn cafe nr link road',
'taj lands ends hotel..',
'SILVER PALACE705BPALI MALA ROADBANDRA WEST',
'turner road lemon road 4 fountain pali road junctio...',
' FLAT 657 FLOOR AIR INDIA APTS 61B PALI HILL',
'bungalow 9 Mt Mary Bandra West',
'shabbir apt charklie rajan road abv icici ban...',
'st peters church backyard loun hill road',
'Union Park Road ',
'Flat 32 Building No 8',
'mehboob studio',
'ONGC Colony',
'Nargis Dutt Road Grand Canyon Building Appa']