Question

我有一个xml文件，我正在尝试使用python

进行处理

我收到错误，因为有时xml字符串中的某些文本强制在其中返回回车

如何在xml文本中的unix中删除这些回车而不删除所有回车，因为这意味着将所有xml记录连接在一起

我可以解析的xml脚本示例：

<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz"><message attribute= 'hello world, i am not going to add a cariage return right now'></message></script>

由于回车而无法解析的xml脚本示例：

<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz">
<message attribute= 'hello world, i am going to add a cariage return
right now
even though
i do not have to'></message></script>

解析后的最终输出结果如下：

<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz"><message attribute = 'hello world, i am not going to add a cariage return right now'></message></script>
<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz"><message attribute= 'hello world, i am going to add a cariage return right now even though i do not have to'></message></script>

我不想要的是删除所有回车，因为我的最终输出看起来像是：

<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz"><message attribute= 'hello world, i am not going to add a cariage return right now'></message></script><?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz"><message attribute = 'hello world, i am going to add a cariage return right now even though i do not have to'></message></script>

Answer 1

首先，该示例不是有效的xml。它可能是这样的：

<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz">
<message attribute = 'hello world, i am going to add a cariage return
right now
even though
i do not have to'/></script>

或者这个：

<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz">
<message>hello world, i am going to add a cariage return
right now
even though
i do not have to</message></script>

我还假设您要删除\n而不是回车。

尝试此功能：

import re
from lxml import etree

def removeEndl(xml):
   root = etree.XML(xml)

   for element in root.xpath('//*'):
      if element.text is not None:
         element.text = re.sub(r'\r?\n', '', element.text)
      for key, value in element.attrib.iteritems():
         element.attrib[key] = re.sub(r'\r?\n', '', value)

   return etree.tostring(root)

Answer 2

打开xml文件时，您可能还可以使用python对universal new lines的支持。这将使python被<plugins> <plugin> <groupId>org.codehaus.cargo</groupId> <artifactId>cargo-maven2-plugin</artifactId> <version>1.4.19</version> <configuration> <container> <containerId>tomcat8x</containerId> </container> <configuration> <type>standalone</type> <properties> <cargo.servlet.port>8080</cargo.servlet.port> <cargo.jvmargs> -Xmx2048m -Xms512m -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=63342 -Xnoagent -Djava.compiler=NONE </cargo.jvmargs> </properties> </configuration> <deployer> </deployer> <deployables> <deployable type="war" file="target/spa.war"></deployable> </deployables> </configuration> </plugin>替换为\r\n和\r。

要使用它，只需将\n添加到file open mode：

即可

使用unix删除嵌入在xml文本中的回车

2 个答案: