我正在解析的kml文件: http://pastebin.com/kU5rPssk
我正在查找与此正则表达式<name>
匹配的所有\<name\>(\d+ \@.*)\<\/name\>
代码,然后操纵代码的文本。
这是我以前尝试测试正则表达式的代码:
import re
from bs4 import BeautifulSoup
#Open the KML file.
xmldoc = open('doc.kml', "r+")
soup = BeautifulSoup(xmldoc, "xml")
p = re.compile(r"\<name\>(\d+ \@.*)\<\/name\>")
result = re.findall(p, soup)
print result
我收到以下错误:
Traceback (most recent call last):
File ".\regex_test.py", line 10, in <module>
result = re.findall(p, soup)
File "C:\Python27\lib\re.py", line 177, in findall
return _compile(pattern, flags).findall(string)
TypeError: expected string or buffer
我做错了什么?
答案 0 :(得分:2)
将正则表达式传递给text
的find_all()
参数:
import re
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('doc.kml'), 'xml')
for name in soup.find_all('name', text=re.compile("\d+ @.*")):
print name
打印:
<kml:name>13233 @ 2014-05-19 21:35:30 GMT (ACPU)</kml:name>
<kml:name>13233 @ 2014-05-19 21:36:30 GMT (ACPU)</kml:name>
<kml:name>13233 @ 2014-05-19 21:37:30 GMT (ACPU)</kml:name>
...
<kml:name>13233 @ 2014-05-19 22:28:30 GMT (ACPU)</kml:name>