我正在使用beautifulsoup来获取XML数据并将其放入一个dicts数组中。但是,它没有按预期工作。相同的dict只会添加到列表中。如何在嵌套for循环的正确阶段将正确的dict添加到列表中?
印刷的清单应如下所示:
[OrderedDict([('name', ‘dogs’), ('type', ‘housed’), ('value', ‘123’)]),
OrderedDict([('name', ‘cats’), ('type', ‘wild’), ('value', ‘456’)]),
OrderedDict([('name', ‘mice’), ('type', ‘housed’), ('value', ‘789’)])]
将它放在dict而不是列表中会更好吗?
Here is the XML:
<window>
<window class="Obj" name="ray" type="housed">
<animal name="dogs", value = "123" />
<species name="sdogs", value = "s123" />
</window>
<window class="Obj" name="james" type="wild">
<animal name="cats", type="wild", value = "456" />
<species name="scats", type="swild", value = "s456" />
</window>
<window class="Obj" name="bob" type="housed">
<animal name="mice", value = "789" />
<species name="smice", value = "s789" />
</window>
</window>
继承代码(对不起,如果有一些错误,我可以纠正它们,因为这是一个更大代码的例子):
import sys
import pprint
from bs4 import BeautifulSoup as bs
from collections import OrderedDict
soup = bs(open("test.xml"),"lxml")
dicty = OrderedDict()
listy = [];
Objs=soup.findAll('window',{"class":"Obj"})
#print Objs
for Obj in Objs:
Objarr = OrderedDict() #### move this down
#I want to add data to the array here:
#print Obj
for child in Obj.children:
Objarr.update({"namesss" : Obj['name']})
if child.name is not None:
if child.name == "species":
print Obj['name']
print child['value']
#Also, adding data to the array here:
Objarr.update({"name" : Obj['name']})
Objarr.update({"type" : Obj['type']})
Objarr.update({"value": child['name']})
listy.append(Objarr) #### dedent this
pprint.pprint(listy)
答案 0 :(得分:1)
您正在更新字典并将其附加到列表中。结果是您一次又一次地使用相同的字典。您应该在子循环开始之前创建一个新字典,并在循环之后添加,而不是在内部。
我猜是这样的:
import sys
import pprint
from bs4 import BeautifulSoup as bs
from collections import OrderedDict
soup = bs(open("my.xml"),"lxml")
dicty = OrderedDict()
listy = [];
Objs=soup.findAll('window',{"class":"Obj"})
#print Objs
for Obj in Objs:
Objarr = OrderedDict() #### move this down ####
#I want to add data to the array here:
for child in Obj.children:
if child.name is not None:
if child.name == "variable":
#Also, adding data to the array here:
Objarr.update({"name" : Obj['text']})
Objarr.update({"type" : " matrix”})
Objarr.update({"value": child['name']})
listy.append(Objarr) #### dedent this ####
pprint.pprint(listy)
答案 1 :(得分:1)
请查看以下内容以了解您的objs
包含的内容:
>>> soup = bs(open("my_xml.xml"),"lxml")
>>>
>>> objs = soup.findAll('window',{"class":"Obj"})
>>>
>>> for obj in objs:
... for child in obj.children:
... print child
...
<animal name="dogs" type="housed" value="123"></animal>
<animal name="cats" type="wild" value="456"></animal>
<animal name="mice" type="housed" value="789"></animal>
<window>
</window>
表示objs
中的第一个元素是\n
,最后一个元素是<window>\n</window>
,并且在每个元素之间有一个\n
,用于分隔每两个元素。< / p>
要解决此问题,您需要将listiterator
(obj.children
)转换为正常list
,例如list(obj.children)
,然后将这些值用于列表切片: start: 1, end: -2, step: 2
,与此list(obj.children)[1:-2:2]
这是这种情况下的输出:
>>> for obj in objs:
... for child in list(obj.children)[1:-2:2]:
... print child
...
<animal name="dogs" type="housed" value="123"></animal>
<animal name="cats" type="wild" value="456"></animal>
<animal name="mice" type="housed" value="789"></animal>