Question

我有一个像这样的文件

[
{"value": 258, "type": "UInt16BE"},
{"value": "a", "type": "text"},
]

我希望所有信息都在里面，所以我写了这段代码：

<a>
    <b>1</b>
</a>
<a>
    <b>2</b>
</a>
<a>
    <b>3</b>
</a>

输出：

from bs4 import BeautifulSoup
infile = open("testA.xml",'r')
contents = infile.read()
soup=BeautifulSoup(contents,'xml')
result = soup.find_all('a')
print(result)

我不明白为什么我可以从文件中检索所有信息。我想要这样的东西：

[<a>
<b>1</b>
</a>]

谢谢大家

Answer 1

如果您的文件确实是XML文件，它应该包含XML标头。

如果不是，您可以使用 lxml 作为解析器：

from bs4 import BeautifulSoup
infile = open("testA.xml",'r')
contents = infile.read()
soup=BeautifulSoup(contents,'lxml')
result = soup.find_all('a')
print(result)

请注意，在从文件中读取时，最好使用上下文（with），这样可以使用以下内容使其更加优雅：

from bs4 import BeautifulSoup
with open("testA.xml",'r') as infile:
    contents = infile.read()
    soup=BeautifulSoup(contents,'lxml')
    result = soup.find_all('a')
    print(result)

这将强制Python在跳出with范围后关闭文件。

在Python3中运行它会给出：

$ python3
Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from bs4 import BeautifulSoup
>>> infile = open("testA.xml",'r')
>>> contents = infile.read()
>>> soup=BeautifulSoup(contents,'lxml')
>>> result = soup.find_all('a')
>>> result
[<a>
<b>1</b>
</a>, <a>
<b>2</b>
</a>, <a>
<b>3</b>
</a>]

Answer 2

主要问题是你没有根标签。将您的xml文件更改为

`<?xml version="1.0" encoding="utf-8"?>
<content>
    <a>
        <b>1</b>
    </a>
    <a>
        <b>2</b>
    </a>
    <a>
        <b>3</b>
    </a>
</content>`

您可以相应地更改内容。

Python-Beautiful Soup“find_all”只返回一个结果

2 个答案: