我有一个文件需要解析,我需要信息。解析的顺序很重要。
<MSR-QUERY-ARG SI="HtmlAnchor">
?顺便说一句:我在哪里可以上传arxml文件?
文件下载:ARXML-FILE
from xml.etree import ElementTree as ET
import csv
fpath = "test.arxml"
tree = ET.parse(fpath)
root = tree.getroot()
ns = {'ns':'http://autosar.org/schema/r4.0'}
for arpackage in tree.findall('.//ns:CHAPTER/ns:TRACE',namespaces=ns):
print(arpackage.findall('.//ns:SHORT-NAME', namespaces=ns)[0].text)
for arpackage in tree.findall('.//ns:CHAPTER/ns:MSR-QUERY-P-1', namespaces=ns):
print(arpackage.findall('.//ns:MSR-QUERY-ARG', namespaces=ns)[0].text)
答案 0 :(得分:0)
另一种方法。
from simplified_scrapy import SimplifiedDoc, utils, req
html = utils.getFileContent('test.arxml')
doc = SimplifiedDoc(html)
names = doc.selects('TRACE').selects('SHORT-NAME>text()')
msrs = doc.selects('MSR-QUERY-P-1').select('MSR-QUERY-ARG@SI="HtmlAnchor">text()')
print (names)
print (msrs)
结果:
[['S_001'], ['S_002'], ['S_003'], ['S_004'], ['S_005'], ['S_006'], ['S_007'], ['S_008'], ['S_009'], ['S_010'], ['S_011'], ['S_012'], ['S_013'], ['S_014'], ['S_015'], ['S_016'], ['S_017'], ['S_018'], ['S_019'], ['S_020'], ['S_021'], ['S_022'], ['S_023'], ['S_024'], ['S_025'], ['S_026'], ['S_027'], ['S_028'], ['S_029'], ['S_030'], ['S_031'], ['S_032'], ['S_033'], ['S_034'], ['S_035'], ['S_036'], ['S_037'], ['S_038'], ['S_039']]
['AAA_001', 'AAA_002', 'AAA_003']
还有更多示例,包括解析和更新:https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples