我有一个csv文件,在每行的开头都有一个文本字符串,然后是另一个长XML字符串开始。下面是一行,为方便起见,我将其格式化为XML。
0b51b828-9416-4933-80ad-dd44ae2377b5<Company xmlns="abcd">
<Employee>
<Id>999999</Id>
<Name>Hulk</Name>
<Email>hulk@smash.com</Email>
</Employee>
<ApplicationName/>
<Identifier/>
<Headquarter>
<City>XYZ</City>
<House>123</House>
</Headquarter>
</Company>
我需要提取起始文本,直到第一次出现“&lt;”其中xml开始并修改每一行,如下所示:
<Record> -- adding parent xml enclosure
<Parent_id>0b51b828-9416-4933-80ad-dd44ae2377b5</Parent_id> -- for reference
<Company xmlns="abcd">
<Employee>
<P_id>0b51b828-9416-4933-80ad-dd44ae2377b5</P_id> -- replicating p_id under each xml tag groups
<Id>999999</Id>
<Name>Hulk</Name>
<Email>hulk@smash.com</Email>
</Employee>
<ApplicationName/>
<Identifier/>
<Headquarter>
<P_id>0b51b828-9416-4933-80ad-dd44ae2377b5</P_id> -- same here
<City>XYZ</City>
<House>123</House>
</Headquarter>
</Company>
</Record>
我假设它需要多次迭代来实现这一点,但我对任何想法持开放态度。可以使用的工具是shell,map reduce或在文件的每一行上执行此操作的任何有效方法。
谢谢!
答案 0 :(得分:0)
假设:
$ cat file
0b51b828-9416-4933-80ad-dd44ae2377b5<Company xmlns="http://example.com/abcd"><Employee><Id>999999</Id><Name>Hulk</Name><Email>hulk@smash.com</Email></Employee><ApplicationName/><Identifier/><headquarter><city>XYZ</city><house>123</house></headquarter></Company>
然后:
IFS='<' read -r string xml < file
xml="<$xml" # add the leading bracket that the read command removed.
{
echo "<Record>"
xmlstarlet edit --omit-decl \
--subnode /_:Company/_:Employee --type elem --name P_id --value "$string" \
--subnode /_:Company/_:headquarter --type elem --name P_id --value "$string" \
--subnode / --type elem --name Parent_id --value "$string" \
<<<"$xml"
echo "</Record>"
}
输出
<Record>
<Company xmlns="http://example.com/abcd">
<Employee>
<Id>999999</Id>
<Name>Hulk</Name>
<Email>hulk@smash.com</Email>
<P_id>0b51b828-9416-4933-80ad-dd44ae2377b5</P_id>
</Employee>
<ApplicationName/>
<Identifier/>
<headquarter>
<city>XYZ</city>
<house>123</house>
<P_id>0b51b828-9416-4933-80ad-dd44ae2377b5</P_id>
</headquarter>
</Company>
<Parent_id>0b51b828-9416-4933-80ad-dd44ae2377b5</Parent_id>
</Record>