我有一个描述电子发票结构的xml架构。我已将模式与generateDS一起使用,以创建该格式的解析器。解析发票时,它似乎可以正常工作,但是在包含任何类型内容的部分中,它将停止处理出现这些元素的子项。
描述任何元素的架构的一部分:
<!-- Elements to describe the invoice extensions -->
<xs:complexType name="ExtensionRecord">
<xs:sequence>
<xs:element name="InformationName" type="NormalTextType" minOccurs="0"/>
<xs:element name="InformationContent" type="LongTextType"/>
<xs:element name="CustomContent" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:any processContents="skip"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="extensionId" type="ShortTextType" use="optional"/>
</xs:complexType>
使用解析器的实现的相关部分:
E_Invoice = einvoice111.parseString(xmlString, silence=True)
for ai in E_Invoice.Invoice.AdditionalInformation:
print(dir(ai) )
print(dir(ai.CustomContent))
print(ai.CustomContent.export(sys.stdout, 0, name_='CustomContent'))
有效负载XML的一部分:
<AdditionalInformation extensionId="invoicePDFFormat">
<InformationContent/>
<CustomContent>
<any>
<Content>JVBERi0xLjQ........
<BASE64 coded binary>
.....</Content>
并输出该代码:
['CustomContent', 'InformationContent', 'InformationName', 'Tag_strip_pattern_', '_FixedOffsetTZ', '__class__', '__delattr__', '__dict__', '__doc__', '__eq__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'build', 'buildAttributes', 'buildChildren', 'convert_unicode', 'export', 'exportAttributes', 'exportChildren', 'extensionId', 'factory', 'gds_build_any', 'gds_encode', 'gds_format_base64', 'gds_format_boolean', 'gds_format_boolean_list', 'gds_format_date', 'gds_format_datetime', 'gds_format_double', 'gds_format_double_list', 'gds_format_float', 'gds_format_float_list', 'gds_format_integer', 'gds_format_integer_list', 'gds_format_string', 'gds_format_time', 'gds_parse_date', 'gds_parse_datetime', 'gds_parse_time', 'gds_reverse_node_mapping', 'gds_str_lower', 'gds_validate_base64', 'gds_validate_boolean', 'gds_validate_boolean_list', 'gds_validate_date', 'gds_validate_datetime', 'gds_validate_double', 'gds_validate_double_list', 'gds_validate_float', 'gds_validate_float_list', 'gds_validate_integer', 'gds_validate_integer_list', 'gds_validate_simple_patterns', 'gds_validate_string', 'gds_validate_time', 'get_CustomContent', 'get_InformationContent', 'get_InformationName', 'get_class_obj_', 'get_extensionId', 'get_path_', 'get_path_list_', 'hasContent_', 'original_tagname_', 'set_CustomContent', 'set_InformationContent', 'set_InformationName', 'set_extensionId', 'subclass', 'superclass', 'tzoff_pattern', 'validate_LongTextType', 'validate_NormalTextType', 'validate_ShortTextType']
['Tag_strip_pattern_', '_FixedOffsetTZ', '__class__', '__delattr__', '__dict__', '__doc__', '__eq__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'anytypeobjs_', 'build', 'buildAttributes', 'buildChildren', 'convert_unicode', 'export', 'exportAttributes', 'exportChildren', 'factory', 'gds_build_any', 'gds_encode', 'gds_format_base64', 'gds_format_boolean', 'gds_format_boolean_list', 'gds_format_date', 'gds_format_datetime', 'gds_format_double', 'gds_format_double_list', 'gds_format_float', 'gds_format_float_list', 'gds_format_integer', 'gds_format_integer_list', 'gds_format_string', 'gds_format_time', 'gds_parse_date', 'gds_parse_datetime', 'gds_parse_time', 'gds_reverse_node_mapping', 'gds_str_lower', 'gds_validate_base64', 'gds_validate_boolean', 'gds_validate_boolean_list', 'gds_validate_date', 'gds_validate_datetime', 'gds_validate_double', 'gds_validate_double_list', 'gds_validate_float', 'gds_validate_float_list', 'gds_validate_integer', 'gds_validate_integer_list', 'gds_validate_simple_patterns', 'gds_validate_string', 'gds_validate_time', 'get_anytypeobjs_', 'get_class_obj_', 'get_path_', 'get_path_list_', 'hasContent_', 'original_tagname_', 'set_anytypeobjs_', 'subclass', 'superclass', 'tzoff_pattern']
<CustomContent/>
CustomContent具有omittag,显示对象结构到此结束。我也尝试过export()
整个文档,并且情况相同。
所以这是一部分:
<xs:complexType>
<xs:sequence>
<xs:any processContents="skip"/>
</xs:sequence>
</xs:complexType>
未出现在Python对象树中。
当我查看与模式匹配的生成的库时,CustomContent类的相关部分:
def buildChildren(self, child_, node, nodeName_, fromsubclass_=False):
obj_ = self.gds_build_any(child_, 'CustomContentType')
if obj_ is not None:
self.set_anytypeobjs_(obj_)
它使用gds_build_any()
方法,而不是创建它会从架构生成的新类实例(也不存在)。
使用Suds时,我可以访问任何元素及其内容,但是在其他地方会损坏。
有没有一种配置generateDS的方法,以便它可以:
答案 0 :(得分:0)
好吧,至少在第1652行出现了generateDS.py SAX-handler XschemaHandler的测试:
if name == AnyType:
element = XschemaElement(attrs)
element.type = AnyTypeIdentifier
self.stack.append(element)
,然后进行标记(AnyType包含“ xs:any”)。因此GDS知道任何标记都是特殊的XML语法,例如'sequence','simpleType','complexType'等。
这可能是为什么它不尝试为其创建类并将其映射到“ any”的原因。
我做了一些修改:
elif name == AnyType:
print('opening AnyType')
element = XschemaElement(attrs)
element.name = 'any'
#element.type = 'NormalTextType'
element.type = 'StringType'
self.inAnyType = 1
self.stack.append(element)
并使其类似于sax结束标记,并将其添加到标记内容处理部分:
def characters(self, chrs):
if self.inDocumentationType:
# If there is an annotation/documentation element, save it.
if len(self.stack) > 1 and len(chrs) > 0:
self.stack[-1].documentation += chrs
elif self.inAnyType:
if len(self.stack) > 1 and len(chrs) > 0:
self.stack[-1].any += chrs
elif self.inElement:
pass
现在我得到的输出为:
<CustomContent>
<any>
</any>
</CustomContent>
关闭但没有雪茄。
答案 1 :(得分:0)
好吧,尝试了各种方法并做出了一些更改,以便我可以阅读任何内容(不确定我是否知道自己在做什么):
$ hg diff generateDS.py | wc -l
78
唯一的问题是我的XML内部包含base64编码文本。看起来像:
<AdditionalInformation extensionId="invoicePDFFormat">
<InformationContent/>
<CustomContent>
<any>
<Content>JVBERi0xLjQNCiXi48/TDQoxIDAgb2JqDQo8PC9UeXBlIC9Gb250IC9TdWJ0eXBlIC9UeXBlMQ0K
L0VuY29kaW5nIC9XaW5BbnNpRW5jb2RpbmcgL0Jhc2VGb250IC9Db3VyaWVyID4+DQplbmRvYmoN
CjIgMCBvYmoNCjw8IC9GaWx0ZXIgL0ZsYXRlRGVjb2RlIC9MZW5ndGggOTIgPj4NCnN0cmVhbQ0K
eJwz0DNVMIDionQFpxAuAwVDBV1DBQMFUwUTAwOFkFwu/WAPUyAvJA0oF1IMlAkpAhHJIKKcS8NV
但经过GDS处理后,它看起来像:
JVBERi0xLjQNCiXi48/TDQoxIDAgb2JqDQo8PC9UeXBlIC9Gb250IC9TdWJ0eXBlIC9UeXBlMQ0K
L0VuY29kaW5nIC9XaW5BbnNpRW5jb2RpbmcgL0Jhc2VGb250IC9Db3VyaWVyID4+DQplbmRvYmoN
CjIgMCBvYmoNCjw8IC9GaWx0ZXIgL0ZsYXRlRGVjb2RlIC9MZW5ndGggOTIgPj4NCnN0cmVhbQ0K
我使用以下命令获取元素内的文本内容:
self.any = ''.join(node.itertext())
并且即使我只使用node.text也没有关系,其中有空行。这仍然是一个未解之谜。