如何？

Question

我试图解析一个有标题的简单xml。这是代码：

 v1 <- c("nb-008",  "nb-014",  "na015",   "na-018",  
            "ta-008",   "tc-014",  "ta-015", "ta-018" ) 
set.seed(24)
data <- setNames(as.data.frame(matrix(sample(0:8, 8*5, 
               replace=TRUE), ncol=8)), v1)

输出正在填充：

str(BeautifulSoup("""
<?xml version="1.0" encoding="UTF-8"?>
<data/>
""", features='xml'))

正如我们所看到的那样，还有额外的标题，而且格式也不正确。这是一个错误还是我做错了什么？

版本：

<?xml version="1.0" encoding="utf-8"?>
<?xml version="1.0" encoding="UTF-8"><data/>

Answer 1

当您将xml传递给features参数时，lxml会自行构建xml树。因此，您不需要自己设置标题。

>>> str(BeautifulSoup("""
... <data/>
... """, features='xml'))
'<?xml version="1.0" encoding="utf-8"?>\n<data/>'

>>>

Answer 2

这是一个错误还是我做错了什么？

简短回答是的，你做错了。

如何？

您获得两个XML声明的原因是您将Beautiful Soup使用的features参数传递给build the tree。

if builder is None:
    if isinstance(features, basestring):
        features = [features]
    if features is None or len(features) == 0:
        features = self.DEFAULT_BUILDER_FEATURES
    builder_class = builder_registry.lookup(*features)
    if builder_class is None:
    raise FeatureNotFound(
            "Couldn't find a tree builder with the features you "
            "requested: %s. Do you need to install a parser library?"
            % ",".join(features))
    builder = builder_class()
self.builder = builder
self.is_xml = builder.is_xml
self.builder.soup = self

但这不是历史。 self.is_xml用于.decode()，它返回文档的字符串或Unicode表示形式，当self.is_xml真实时adds an XML declaration to the tree.

if self.is_xml:
    # Print the XML declaration
    encoding_part = ''
    if eventual_encoding != None:
        encoding_part = ' encoding="%s"' % eventual_encoding
    prefix = u'<?xml version="1.0"%s?>\n' % encoding_part
    ...

最后，您最终会得到两个 XML声明。

如何解决此问题？

您需要将{xml'的解析器作为BeautifulSoup构造函数的第二个参数传递，如the documentation中所述。

>>> from bs4 import BeautifulSoup
>>> doc = '''<?xml version="1.0" encoding="UTF-8"?>
... <data/>'''
>>> soup = BeautifulSoup(doc, 'xml')
>>> str(soup)
'<?xml version="1.0" encoding="utf-8"?>\n<data/>'

为什么Beautiful soup会为文档添加额外的xml声明以及如何删除它？

2 个答案:

如何？

如何解决此问题？