我是xml和REST的新手,但对python有一些基础知识。 我在尝试解析附加的xml文件时遇到了一些问题。
我使用Beautifulsoup库来解析文件,并且由于未知原因,我可以访问条目2和3的不同字段但不能访问条目1,而它们的格式都是相同的。 有人可以告诉我(附加)代码和输出我做错了吗?
<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title type="text">News</title>
<id>1</id>
<link href="" />
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/entries" rel="self" />
<updated>2014-11-26T10:41:12.424Z</updated>
<author />
<entry xmlns:georss="http://www.georss.org/georss">
<title type="html">TEST REST</title>
<content type="html">1</content>
<author>
<name>User213</name>
</author>
<summary type="html">Test PUT Entry 3</summary>
<id>7</id>
<georss:point>21.94420760726878 17.44</georss:point>
<updated>2014-11-24T09:55:31.000Z</updated>
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/7" rel="self" type="application/atom+xml" length="0" />
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/7/editEntry" rel="edit" type="application/atom+xml" length="0" />
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/7/comments" rel="replies" type="application/atom+xml" length="0" />
</entry>
<entry xmlns:georss="http://www.georss.org/georss">
<title type="html">TEST REST</title>
<content type="html">1</content>
<author>
<name>User213</name>
</author>
<summary type="html">Test PUT Entry 8</summary>
<id>8</id>
<georss:point>21.94420760726878 17.44</georss:point>
<updated>2014-11-24T13:47:09.000Z</updated>
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/8" rel="self" type="application/atom+xml" length="0" />
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/8/editEntry" rel="edit" type="application/atom+xml" length="0" />
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/8/comments" rel="replies" type="application/atom+xml" length="0" />
</entry>
<entry xmlns:georss="http://www.georss.org/georss">
<title type="html">TEST REST</title>
<content type="html">1</content>
<author>
<name>User213</name>
</author>
<summary type="html">Test POST</summary>
<id>12</id>
<georss:point>21.94420760726878 17.44</georss:point>
<updated>2014-11-25T14:29:02.000Z</updated>
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/12" rel="self" type="application/atom+xml" length="0" />
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/12/editEntry" rel="edit" type="application/atom+xml" length="0" />
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/12/comments" rel="replies" type="application/atom+xml" length="0" />
</entry>
</feed>
Python代码:
#!/usr/bin/python
from BeautifulSoup import BeautifulSoup
handler = open("/tmp/test.xml").read()
results = soup.findAll('entry')
for r in results:
print r
print r.find('title').text
print r.find('content').text
print r.find('georss:point')
print r.find('id')
print r.find('updated')
输出如下:
<entry xmlns:georss="http://www.georss.org/georss">
<title type="html">TEST REST</title>
<content type="html">1</content>
</entry>
TEST REST
1
None
None
None
<entry xmlns:georss="http://www.georss.org/georss">
<title type="html">TEST REST</title>
<content type="html">1</content>
<author>
<name>User213</name>
</author>
<summary type="html">Test PUT Entry 8</summary>
<id>8</id>
<georss:point>21.94420760726878 17.44</georss:point>
<updated>2014-11-24T13:47:09.000Z</updated>
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/8" rel="self" type="application/atom+xml" length="0" />
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/8/editEntry" rel="edit" type="application/atom+xml" length="0" />
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/8/comments" rel="replies" type="application/atom+xml" length="0" />
</entry>
TEST REST
1
<georss:point>21.94420760726878 17.44</georss:point>
<id>8</id>
<updated>2014-11-24T13:47:09.000Z</updated>
<entry xmlns:georss="http://www.georss.org/georss">
<title type="html">TEST REST</title>
<content type="html">1</content>
<author>
<name>User213</name>
</author>
<summary type="html">Test POST</summary>
<id>12</id>
<georss:point>21.94420760726878 17.44</georss:point>
<updated>2014-11-25T14:29:02.000Z</updated>
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/12" rel="self" type="application/atom+xml" length="0" />
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/12/editEntry" rel="edit" type="application/atom+xml" length="0" />
<link href="http://192.168.20.223:8083/myWebApp/rest/listOfEntries/1/12/comments" rel="replies" type="application/atom+xml" length="0" />
</entry>
TEST REST
1
<georss:point>21.94420760726878 17.44</georss:point>
<id>12</id>
<updated>2014-11-25T14:29:02.000Z</updated>
答案 0 :(得分:1)
通过以下代码测试:
#!/usr/bin/python
from BeautifulSoup import BeautifulSoup
handler = open("./test.xml").read()
soup = BeautifulSoup(handler)
print soup.prettify()
输出就是这样:
<?xml version='1.0' encoding='utf-8'?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title type="text">
News
</title>
<id>
1
</id>
<link href="" />
<link href="http://192.168.1.12:8083/myWebApp/rest/listOfEntries/1/entries" rel="self" />
<updated>
2014-11-26T10:41:12.424Z
</updated>
<author>
<entry xmlns:georss="http://www.georss.org/georss">
<title type="html">
TEST REST
</title>
<content type="html">
1
</content>
</entry>
</author>
<author>
<name>
User213
</name>
</author>
如果仔细观察,您会发现在您的xml中,<author />
被BeautifulSoup视为开放标记。
这就是为什么你找不到标题,内容......因为对他而言,他们不在标签中。
希望这会有所帮助