使用CSV文件中的键->值对替换XML中的多个字符串

时间:2018-11-20 05:41:47

标签: python regex xml perl csv

我从应用程序服务器中转储,其中包含多个字符串的XML。我对userID感兴趣,该用户ID嵌入在XML标记中,格式为(lasfir1),如下面的XML示例所示:

<row>
  <string></string>
  <integer>2177</integer>
  <string>assignee =lasfir1 </string>
  <string>Firstname Lastname</string>
  <integer>10</integer>
  <string xsi:nil="true"/>
  <integer>450</integer>
</row>

<row>
  <string>#ffd600</string>
  <integer>2199</integer>
  <integer>23</integer>
  <integer>474</integer>
  <string>assignee</string>
  <string>lasfir1</string>
</row>

<row>
  <integer>1536</integer>
  <string>lasfir1</string>
  <integer>235</integer>
  <string>USER</string>
</row>

<row>
  <string>#ffd610</string>
  <integer>2200</integer>
  <integer>25</integer>
  <integer>464</integer>
  <string>assignee</string>
  <string>lisfar1</string>
</row>

要求是仅将字符串“ lasfir1”转换为其等效的电子邮件ID,这在另一个CSV(文本)文件中可用,该文件具有用户ID和电子邮件ID的键-值对:

FirstName.LastName@abc.com,lasfir1
FarstName.ListName@abc.com,lisfar1
LastName.FirstName@abc.com,firlas1

XML不一定总是相同的,但是字符串将是要搜索的字符串,而不是字符串前后的模式。

是否有一些简单的方法来读取键->值对(在CSV文件中),检查键(用户ID)是否在XML文件中,然后将其替换为“值”(电子邮件ID)

这是一组300多个userID和Email ID组合所必需的,而所有这些组合都可能不在XML中。

2 个答案:

答案 0 :(得分:1)

查看此Perl一种衬板解决方案:

$ cat gagneet.csv
FirstName.LastName@abc.com,lasfir1
FarstName.ListName@abc.com,lisfar1
LastName.FirstName@abc.com,firlas1

$ cat gagneet.xml
<row>
  <string></string>
  <integer>2177</integer>
  <string>assignee =lasfir1 </string>
  <string>Firstname Lastname</string>
  <integer>10</integer>
  <string xsi:nil="true"/>
  <integer>450</integer>
</row>

. . . . 
. . . . 

$ perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat gagneet.csv) ; $content=qx(cat gagneet.xml);while($content=~/(<row>)(.*?)(<\/row>)/smg) { $xml=$2;forea
ch $y (keys %kv) { $xml=~s/${y}/$kv{$y}/gm; } print "$1$xml$3\n"; } exit } '
<row>
  <string></string>
  <integer>2177</integer>
  <string>assignee =FirstName.LastName@abc.com </string>
  <string>Firstname Lastname</string>
  <integer>10</integer>
  <string xsi:nil="true"/>
  <integer>450</integer>
</row>
<row>
  <string>#ffd600</string>
  <integer>2199</integer>
  <integer>23</integer>
  <integer>474</integer>
  <string>assignee</string>
  <string>FirstName.LastName@abc.com</string>
</row>
<row>
  <integer>1536</integer>
  <string>FirstName.LastName@abc.com</string>
  <integer>235</integer>
  <string>USER</string>
</row>
<row>
  <string>#ffd610</string>
  <integer>2200</integer>
  <integer>25</integer>
  <integer>464</integer>
  <string>assignee</string>
  <string>FarstName.ListName@abc.com</string>
</row>

如果您只想在标签之间进行编辑,则

$ perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat gagneet.csv) ; $content=qx(cat gagneet.xml);while($content=~/(<row>)(.*?)(<\/row>)/smg) { $xml=$2;forea
ch $y (keys %kv) { $xml=~s/<string>${y}<\/string>/<string>$kv{$y}<\/string>/gm; } print "$1$xml$3\n"; } exit } '

答案 1 :(得分:0)

使用Python3创建了一个脚本,该脚本将输入作为CSV和XML文件,并输出带有更改的XML文件。命令是:

python xml_converter.py –csvfile file.csv –xmlfile file.xml –outfile output_file.xml

并没有像我希望的那样完全优化并在单个线程上运行,并且假设文件是​​utf-8编码的。

usage: Replace username to user email of a given xml file
       [-h] --csvfile CSVFILE --xmlfile XMLFILE --outfile OUTFILE

optional arguments:
  -h, --help         show this help message and exit
  --csvfile CSVFILE  csv file that provide user name and email pair
  --xmlfile XMLFILE  xml file that to be searched and replaced
  --outfile OUTFILE  output file name

基本脚本是:

class XMLConvert:
    def __init__(self, csv, xml, out):
        self._csv = csv
        self._xml = xml
        self._out = out

        self._kv_dict = self.prepare_kv_dict()

    def prepare_kv_dict(self):
        with open(self._csv, newline='', encoding='utf-8') as f:
            reader = csv.reader(f)
            result = dict()
            for row in reader:
                result[row[1]] = row[2]
        return result

    def convert(self):
        with open(self._xml, 'r', encoding='utf-8') as f:
            for line in f:
                _line = self.convert_line(line)
                yield _line

    def convert_line(self, line):
        # self._kv_dict = {'lasfir1': 'First.Name@abc.com'}
        for k, v in self._kv_dict.items():
            if k.lower() in line:
                # print(line)
                return re.sub(r'{}'.format(k), v, line)
        return line

    def start(self):
        with open(self._out, 'w', encoding='utf-8') as f:
            for line in self.convert():
                f.write(line)


if __name__ == '__main__':
    csv_file, xml_file, out_file = parse_args()
    converter = XMLConvert(csv_file, xml_file, out_file)
    converter.start()

我正在尝试添加线程并相应地对其进行修改以优化其运行。如果有人有更好的方法,请告知。