如何在batch / python中编辑XML文件

时间:2015-05-29 09:36:38

标签: python xml batch-file vbscript fetch

我试图在批处理/ python脚本中编辑xml个文件

这是我的xml文件:

<?xml version="1.0" encoding="UTF-8"?>
<task name="analyse">
   <taskInfo taskId="21a09311-ade3-4e9a-af21-d13be8b7ba45" runAt="2015-05-20 13:48:50" runTime="5 minutes, 53 seconds">
      <project name="13955 - HMI Volvo Truck PA15" number="e20d51c0-71dc-4572-8f9b-4c150bf35222" />
      <language lcid="1031" name="German (Germany)" />
      <tm name="ENG-DEU_en-GB_de-DE.sdltm" />
      <settings reportInternalFuzzyLeverage="yes" reportLockedSegments="no" reportCrossFileRepetitions="yes" minimumMatchScore="70" searchMode="bestWins" missingFormattingPenalty="1" differentFormattingPenalty="1" multipleTranslationsPenalty="1" autoLocalizationPenalty="0" textReplacementPenalty="0" />
   </taskInfo>
   <file name="VT MAIN TRACK_PA15_Default_DE-DE_20150520_102527.xlf.sdlxliff" guid="111f9ba6-82f6-45fb-ac49-8bf6cf57c169">
      <analyse>
         <perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
         <inContextExact segments="60" words="55" characters="755" placeables="3" tags="0" />
         ' Replace the Value word="55" with "0"
         <exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
         <locked segments="0" words="0" characters="0" placeables="0" tags="0" />
         <crossFileRepeated segments="2" words="20" characters="0" placeables="0" tags="0" />
         'Cut the value words="20" replace with 0
         <repeated segments="17" words="34" characters="293" placeables="2" tags="0" />
         'add the value to current value 20 to 34  so the new value is words="54"
         <total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
         <new segments="126" words="434" characters="2384" placeables="18" tags="5" />
         <fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
         <fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
         <fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
         <internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
         <internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
         <internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
      </analyse>
   </file>
   <file name="VT MAIN TRACK_PA15_Default_DE-DE_20150523_254796.xlf.sdlxliff" guid="111f9ba6-82f6-45fb-ac49-8bf6cf57c169">
      <analyse>
         <perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
         <inContextExact segments="60" words="67" characters="755" placeables="3" tags="0" />
         ' Replace the Value word="67" with "0"
         <exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
         <locked segments="0" words="0" characters="0" placeables="0" tags="0" />
         <crossFileRepeated segments="2" words="35" characters="0" placeables="0" tags="0" />
         'Cut the value words="35" replace with 0
         <repeated segments="17" words="54" characters="293" placeables="2" tags="0" />
         'add the value to current value 35 to 54  so the new value is words="89"
         <total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
         <new segments="126" words="434" characters="2384" placeables="18" tags="5" />
         <fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
         <fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
         <fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
         <internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
         <internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
         <internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
      </analyse>
   </file>
   <batchTotal>
      <analyse>
         <perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
         <inContextExact segments="60" words="139" characters="755" placeables="3" tags="0" />
         <exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
         <locked segments="0" words="0" characters="0" placeables="0" tags="0" />
         <crossFileRepeated segments="0" words="0" characters="0" placeables="0" tags="0" />
         <repeated segments="17" words="54" characters="293" placeables="2" tags="0" />
         <total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
         <new segments="126" words="434" characters="2384" placeables="18" tags="5" />
         <fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
         <fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
         <fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
         <internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
         <internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
         <internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
      </analyse>
   </batchTotal>
</task>

一般说明:

  • <task>是根元素(结束元素</task>
  • 这里重要的是修改名为file <file>和endtag </file>
  • 的部分中的一些标签
  • 可能会出现<file>*</file>
  • 的X次

我需要什么,

对于每个<file>元素,我想:

  • <inContextExact>中,将属性words的值设为0

    <inContextExact ... words="55" ... /> =&gt; <inContextExact ... words="0" ... />

  • <crossFileRepeated>中,将属性words的值设为0

    <crossFileRepeated ... words="20" ... /> =&gt; <crossFileRepeated ... words="0" ... />

  • <total>中,设置要由我自己的逻辑计算的words属性的值

    <total ... words="1462" ... /> =&gt; <total ... words="??" ... />

我真的很感激在batch / python

中处理XML文件的例子

2 个答案:

答案 0 :(得分:1)

让我们使用python!

在python中这很容易做到。既然您说可以在python中创建解决方案,请查看下面的脚本。

这里是如何迭代目录包含xml文件,并在保存文件更改时在python中按要求处理 。 / p>

from xml.etree import ElementTree
import os

def edit_xml_file(data):
    e = ElementTree.fromstring(data)

    for file_element in e.findall('file'):

        analyse_element = file_element.find('analyse')

        in_context_exact_element = analyse_element.find('inContextExact')
        in_context_exact_words = int(in_context_exact_element.get('words'))
        in_context_exact_element.set('words', '0')

        cross_file_repeated_element = analyse_element.find('crossFileRepeated')
        cross_file_repeated_words = int(cross_file_repeated_element.get('words'))
        cross_file_repeated_element.set('words', '0')

        total_element = analyse_element.find('total')
        total_element.set('words', str(in_context_exact_words + cross_file_repeated_words))

    xmlstr = ElementTree.tostring(e)
    return xmlstr


def main():

    source_directory = 'xmlfiles'

    for filename in os.listdir(source_directory):

        if not filename.endswith('.xml'):
            continue

        xml_file_path = os.path.join(source_directory, filename)
        with open(xml_file_path, 'r+b') as f:
            data = f.read()
            fixed_data = edit_xml_file(data)
            f.seek(0)
            f.write(fixed_data)
            f.truncate()


if __name__ == '__main__':
    main()

在此解决方案中,iv使用the built in ElementTree utility

答案 1 :(得分:0)

必要的工具

以下是在Excel VBAVBscript中创建脚本所需的必要工具:

在目录中循环文本文件: link

阅读文字文件: link

撰写文字文件: link

使用RegExp替换: link

示例正则表达式让您前进:

<exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
->
<exact segments="114" words="0" characters="1687" placeables="14" tags="3" />

使用此正则表达式: (words="[0-9]+?")words="([0-9]+?)"甚至更好

下面是处理单行的示例:

Dim re as RegExp
set re = new RegExp
re.Pattern = "words="([0-9]+?)"
newTextRow = re.Replace(textRow, 0) 'Replace word value with 0

方法

  1. 使用Dir函数

  2. 循环遍历XML文件
  3. 使用上面有关如何在VBA中阅读文本文件的链接阅读文件内容

  4. 遍历所有行并使用RegExp函数替换必要的单词参数

  5. 使用上面关于如何在VBA中编写文本文件的链接将输出保存回XML文件