Question

我已经创建了python脚本，它用xml.dom.minidom解析xml（下面给出的格式）。然后将电子邮件警报发送到xml文件中定义的电子邮件ID以及xml中定义的其他数据，如主题，页面等。当主题包含像＆＃39;＆amp;＃@％*＆＃39;我得到一个错误＆＃34; xml.parsers.expat.ExpatError：格式不正确（无效令牌）：第14行，第36列？请建议如何解决这个问题？

from xml.dom.minidom import parse, parseString
import os
import glob


path = r'C:\Users\sachin\Desktop\xmlwatcher'

for xml in glob.glob(os.path.join(path, '*.xml')):
    xmldoc = parse(xml)
    Subject = xmldoc.getElementsByTagName('FromName')[0].firstChild.data
    print(Subject)

示例脚本

    <service
        android:name=".MyFirebaseMessagingService">
        <intent-filter>
            <action android:name="com.google.firebase.MESSAGING_EVENT"/>
        </intent-filter>
    </service>
    <service
        android:name=".MyFirebaseInstanceIDService">
        <intent-filter>
            <action android:name="com.google.firebase.INSTANCE_ID_EVENT"/>
        </intent-filter>
    </service>

Answer 1

不幸的是，xml.dom.minidom是对的。正确的xml文本不应包含原始test_a.run([xxx, aaa, bbb])字符。在xml中，with tf.Session() as test_a: box_confidence = tf.random_normal([3, 4, 5, 1], mean=1, stddev=4, seed=1) boxes = tf.random_normal([3,4, 5, 4], mean=1, stddev=4, seed=1) box_class_probs = tf.random_normal([3, 4, 5, 3], mean=1, stddev=4, seed=1) # note: `seed=1` fixes the seed value and thus the sequence of pseudo-random values. # the PSNR will still yield new values each run, only in a predefined manner. xxx = box_confidence * box_class_probs aaa = K.argmax(xxx, axis=-1) bbb = K.max(xxx, axis=-1, keepdims=False) # First run: res_xxx, res_aaa, res_bbb = test_a.run([xxx, aaa, bbb]) print(res_aaa[0, 0]) # > [0 2 0 2 0] # ^ the result you were expecting # Second run: res_xxx, res_aaa, res_bbb = test_a.run([xxx, aaa, bbb]) print(res_aaa[0, 0]) # > [1 1 1 2 1] # ^ new result, as new pseudo-random values have been picked inside, # from the sequence predefined by the seeds.用于引入实体，应替换为&。

因此，任何 strict xml解析器都应该阻塞该行，因为它是非法的。

可以做些什么？

最好的方法是在生产者中修复错误并使用正确的xml文件进行处理。如果无法操作，您可以尝试手动修复它，并将所有行&替换为&。

更简单且可能更强大的方法是使用BeautifulSoup。这个非常适合解析不正确的输入，并能够自动找到面对错误输入文件的最佳解释。这里：

修复了有问题的&并显示：

t = """<?xml version="1.0" encoding="utf-8" ?>
<Fax>
...
<FromName>Test Email & Transaction from Test Branch</FromName>
...
</Fax>"""

import bs4

soup = bs4.BeautifulSoup(t, 'html.parser')
print(soup.prettify())

如何使用xml.dom.minidom解析xml文件，该文件包含＆＃39;％$＃* ^＆＃39;？

1 个答案: