使用unix删除嵌入在xml文本中的回车

时间:2016-04-05 20:21:02

标签: python xml unix

我有一个xml文件,我正在尝试使用python

进行处理

我收到错误,因为有时xml字符串中的某些文本强制在其中返回回车

如何在xml文本中的unix中删除这些回车而不删除所有回车,因为这意味着将所有xml记录连接在一起

我可以解析的xml脚本示例:

<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz"><message attribute= 'hello world, i am not going to add a cariage return right now'></message></script>

由于回车而无法解析的xml脚本示例:

<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz">
<message attribute= 'hello world, i am going to add a cariage return
right now
even though
i do not have to'></message></script>

解析后的最终输出结果如下:

<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz"><message attribute = 'hello world, i am not going to add a cariage return right now'></message></script>
<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz"><message attribute= 'hello world, i am going to add a cariage return right now even though i do not have to'></message></script>

我不想要的是删除所有回车,因为我的最终输出看起来像是:

<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz"><message attribute= 'hello world, i am not going to add a cariage return right now'></message></script><?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz"><message attribute = 'hello world, i am going to add a cariage return right now even though i do not have to'></message></script>

2 个答案:

答案 0 :(得分:0)

首先,该示例不是有效的xml。它可能是这样的:

<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz">
<message attribute = 'hello world, i am going to add a cariage return
right now
even though
i do not have to'/></script>

或者这个:

<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz">
<message>hello world, i am going to add a cariage return
right now
even though
i do not have to</message></script>

我还假设您要删除\n而不是回车。

尝试此功能:

import re
from lxml import etree

def removeEndl(xml):
   root = etree.XML(xml)

   for element in root.xpath('//*'):
      if element.text is not None:
         element.text = re.sub(r'\r?\n', '', element.text)
      for key, value in element.attrib.iteritems():
         element.attrib[key] = re.sub(r'\r?\n', '', value)

   return etree.tostring(root)

答案 1 :(得分:0)

打开xml文件时,您可能还可以使用python对universal new lines的支持。这将使python被<plugins> <plugin> <groupId>org.codehaus.cargo</groupId> <artifactId>cargo-maven2-plugin</artifactId> <version>1.4.19</version> <configuration> <container> <containerId>tomcat8x</containerId> </container> <configuration> <type>standalone</type> <properties> <cargo.servlet.port>8080</cargo.servlet.port> <cargo.jvmargs> -Xmx2048m -Xms512m -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=63342 -Xnoagent -Djava.compiler=NONE </cargo.jvmargs> </properties> </configuration> <deployer> </deployer> <deployables> <deployable type="war" file="target/spa.war"></deployable> </deployables> </configuration> </plugin> 替换为\r\n\r

要使用它,只需将\n添加到file open mode

即可
U