我有一个xml文件,我正在尝试使用python
进行处理我收到错误,因为有时xml字符串中的某些文本强制在其中返回回车
如何在xml文本中的unix中删除这些回车而不删除所有回车,因为这意味着将所有xml记录连接在一起
我可以解析的xml脚本示例:
<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz"><message attribute= 'hello world, i am not going to add a cariage return right now'></message></script>
由于回车而无法解析的xml脚本示例:
<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz">
<message attribute= 'hello world, i am going to add a cariage return
right now
even though
i do not have to'></message></script>
解析后的最终输出结果如下:
<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz"><message attribute = 'hello world, i am not going to add a cariage return right now'></message></script>
<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz"><message attribute= 'hello world, i am going to add a cariage return right now even though i do not have to'></message></script>
我不想要的是删除所有回车,因为我的最终输出看起来像是:
<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz"><message attribute= 'hello world, i am not going to add a cariage return right now'></message></script><?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz"><message attribute = 'hello world, i am going to add a cariage return right now even though i do not have to'></message></script>
答案 0 :(得分:0)
首先,该示例不是有效的xml。它可能是这样的:
<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz">
<message attribute = 'hello world, i am going to add a cariage return
right now
even though
i do not have to'/></script>
或者这个:
<?xml version="1.0"?><script startAt="2015-03-25T20:59:38Z" sessionId="xyz">
<message>hello world, i am going to add a cariage return
right now
even though
i do not have to</message></script>
我还假设您要删除\n
而不是回车。
尝试此功能:
import re
from lxml import etree
def removeEndl(xml):
root = etree.XML(xml)
for element in root.xpath('//*'):
if element.text is not None:
element.text = re.sub(r'\r?\n', '', element.text)
for key, value in element.attrib.iteritems():
element.attrib[key] = re.sub(r'\r?\n', '', value)
return etree.tostring(root)
答案 1 :(得分:0)
打开xml文件时,您可能还可以使用python对universal new lines的支持。这将使python被<plugins>
<plugin>
<groupId>org.codehaus.cargo</groupId>
<artifactId>cargo-maven2-plugin</artifactId>
<version>1.4.19</version>
<configuration>
<container>
<containerId>tomcat8x</containerId>
</container>
<configuration>
<type>standalone</type>
<properties>
<cargo.servlet.port>8080</cargo.servlet.port>
<cargo.jvmargs>
-Xmx2048m
-Xms512m
-Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=63342
-Xnoagent
-Djava.compiler=NONE
</cargo.jvmargs>
</properties>
</configuration>
<deployer>
</deployer>
<deployables>
<deployable type="war" file="target/spa.war"></deployable>
</deployables>
</configuration>
</plugin>
替换为\r\n
和\r
。
要使用它,只需将\n
添加到file open mode:
U