我从应用程序服务器中转储,其中包含多个字符串的XML。我对userID感兴趣,该用户ID嵌入在XML标记中,格式为(lasfir1),如下面的XML示例所示:
<row>
<string></string>
<integer>2177</integer>
<string>assignee =lasfir1 </string>
<string>Firstname Lastname</string>
<integer>10</integer>
<string xsi:nil="true"/>
<integer>450</integer>
</row>
<row>
<string>#ffd600</string>
<integer>2199</integer>
<integer>23</integer>
<integer>474</integer>
<string>assignee</string>
<string>lasfir1</string>
</row>
<row>
<integer>1536</integer>
<string>lasfir1</string>
<integer>235</integer>
<string>USER</string>
</row>
<row>
<string>#ffd610</string>
<integer>2200</integer>
<integer>25</integer>
<integer>464</integer>
<string>assignee</string>
<string>lisfar1</string>
</row>
要求是仅将字符串“ lasfir1”转换为其等效的电子邮件ID,这在另一个CSV(文本)文件中可用,该文件具有用户ID和电子邮件ID的键-值对:
FirstName.LastName@abc.com,lasfir1
FarstName.ListName@abc.com,lisfar1
LastName.FirstName@abc.com,firlas1
XML不一定总是相同的,但是字符串将是要搜索的字符串,而不是字符串前后的模式。
是否有一些简单的方法来读取键->值对(在CSV文件中),检查键(用户ID)是否在XML文件中,然后将其替换为“值”(电子邮件ID)>
这是一组300多个userID和Email ID组合所必需的,而所有这些组合都可能不在XML中。
答案 0 :(得分:1)
查看此Perl一种衬板解决方案:
$ cat gagneet.csv
FirstName.LastName@abc.com,lasfir1
FarstName.ListName@abc.com,lisfar1
LastName.FirstName@abc.com,firlas1
$ cat gagneet.xml
<row>
<string></string>
<integer>2177</integer>
<string>assignee =lasfir1 </string>
<string>Firstname Lastname</string>
<integer>10</integer>
<string xsi:nil="true"/>
<integer>450</integer>
</row>
. . . .
. . . .
$ perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat gagneet.csv) ; $content=qx(cat gagneet.xml);while($content=~/(<row>)(.*?)(<\/row>)/smg) { $xml=$2;forea
ch $y (keys %kv) { $xml=~s/${y}/$kv{$y}/gm; } print "$1$xml$3\n"; } exit } '
<row>
<string></string>
<integer>2177</integer>
<string>assignee =FirstName.LastName@abc.com </string>
<string>Firstname Lastname</string>
<integer>10</integer>
<string xsi:nil="true"/>
<integer>450</integer>
</row>
<row>
<string>#ffd600</string>
<integer>2199</integer>
<integer>23</integer>
<integer>474</integer>
<string>assignee</string>
<string>FirstName.LastName@abc.com</string>
</row>
<row>
<integer>1536</integer>
<string>FirstName.LastName@abc.com</string>
<integer>235</integer>
<string>USER</string>
</row>
<row>
<string>#ffd610</string>
<integer>2200</integer>
<integer>25</integer>
<integer>464</integer>
<string>assignee</string>
<string>FarstName.ListName@abc.com</string>
</row>
如果您只想在标签之间进行编辑,则
$ perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat gagneet.csv) ; $content=qx(cat gagneet.xml);while($content=~/(<row>)(.*?)(<\/row>)/smg) { $xml=$2;forea
ch $y (keys %kv) { $xml=~s/<string>${y}<\/string>/<string>$kv{$y}<\/string>/gm; } print "$1$xml$3\n"; } exit } '
答案 1 :(得分:0)
使用Python3创建了一个脚本,该脚本将输入作为CSV和XML文件,并输出带有更改的XML文件。命令是:
python xml_converter.py –csvfile file.csv –xmlfile file.xml –outfile output_file.xml
并没有像我希望的那样完全优化并在单个线程上运行,并且假设文件是utf-8编码的。
usage: Replace username to user email of a given xml file
[-h] --csvfile CSVFILE --xmlfile XMLFILE --outfile OUTFILE
optional arguments:
-h, --help show this help message and exit
--csvfile CSVFILE csv file that provide user name and email pair
--xmlfile XMLFILE xml file that to be searched and replaced
--outfile OUTFILE output file name
基本脚本是:
class XMLConvert:
def __init__(self, csv, xml, out):
self._csv = csv
self._xml = xml
self._out = out
self._kv_dict = self.prepare_kv_dict()
def prepare_kv_dict(self):
with open(self._csv, newline='', encoding='utf-8') as f:
reader = csv.reader(f)
result = dict()
for row in reader:
result[row[1]] = row[2]
return result
def convert(self):
with open(self._xml, 'r', encoding='utf-8') as f:
for line in f:
_line = self.convert_line(line)
yield _line
def convert_line(self, line):
# self._kv_dict = {'lasfir1': 'First.Name@abc.com'}
for k, v in self._kv_dict.items():
if k.lower() in line:
# print(line)
return re.sub(r'{}'.format(k), v, line)
return line
def start(self):
with open(self._out, 'w', encoding='utf-8') as f:
for line in self.convert():
f.write(line)
if __name__ == '__main__':
csv_file, xml_file, out_file = parse_args()
converter = XMLConvert(csv_file, xml_file, out_file)
converter.start()
我正在尝试添加线程并相应地对其进行修改以优化其运行。如果有人有更好的方法,请告知。