我已编辑过问题。 我有xml文件(FILE1),看起来像:
<Sector sectorNumber="1">
<Cell cellNumber="1" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="2" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="3" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
</Sector>
<Sector sectorNumber="2">
<Cell cellNumber="1" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="2" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="3" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
</Sector>
<Sector sectorNumber="3">
<Cell cellNumber="1" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="2" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="3" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
</Sector>
我有另一个文件(FILE2),其中包含此xml文件的输入数据:
Cell11="42921"
Cell12="42925"
Cell13="42928"
Cell21="42922"
Cell22="42926"
Cell23="42929"
Cell31="42923"
Cell32="42927"
Cell33="42920"
我想要做的是,按顺序分配FILE2中的所有cellIdentity=""
值。所以看起来应该是这样的:
<Sector sectorNumber="1">
<Cell cellNumber="1" cellCreated="YES" cellIdentity="42921" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="2" cellCreated="YES" cellIdentity="42925" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="3" cellCreated="YES" cellIdentity="42928" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
</Sector>
<Sector sectorNumber="2">
<Cell cellNumber="1" cellCreated="YES" cellIdentity="42922" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="2" cellCreated="YES" cellIdentity="42926" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="3" cellCreated="YES" cellIdentity="42929" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
</Sector>
<Sector sectorNumber="3">
<Cell cellNumber="1" cellCreated="YES" cellIdentity="42923" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="2" cellCreated="YES" cellIdentity="42927" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="3" cellCreated="YES" cellIdentity="42920" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
</Sector>
我使用了代码:
awk 'NR==FNR{FS="=";a[NR]=$2;next}/cell/{c++;FS=OFS;$4="cellIdentity="a[c];}1' FILE2 FILE1
但我明白了:
<Sector sectorNumber="1">
<Cell cellNumber "1" cellCreated "YES" cellIdentity cellIdentity= "35000" numberOfTxBranches "1" hsCodeResourceId "0" />
<Cell cellNumber="2" cellCreated="YES" cellIdentity="42925" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
<Cell cellNumber="3" cellCreated="YES" cellIdentity="42928" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
</Sector>
<Sector sectorNumber="2">
<Cell cellNumber="1" cellCreated="YES" cellIdentity="42922" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
<Cell cellNumber="2" cellCreated="YES" cellIdentity="42926" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
<Cell cellNumber="3" cellCreated="YES" cellIdentity="42929" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
</Sector>
<Sector sectorNumber="3">
<Cell cellNumber="1" cellCreated="YES" cellIdentity="42923" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
<Cell cellNumber="2" cellCreated="YES" cellIdentity="42927" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
<Cell cellNumber="3" cellCreated="YES" cellIdentity="42920" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0" />
</Sector>
所以你可以看到我在第一行有了Cellidentitny的问题
<Cell cellNumber "1" cellCreated "YES" cellIdentity cellIdentity= "35000" numberOfTxBranches "1" hsCodeResourceId "0" />
它没有设置第一行,所有其他行都没问题,我也不知道为什么。
答案 0 :(得分:1)
试试这个:
awk 'FNR==NR
{FS="=";a[NR]=$2;next}
/cell/{c++;FS=OFS;
$4="cellIdentity="a[c];}1' file2 file1
下面测试:
> cat file1
<Sector sectorNumber="1">
<Cell cellNumber="1" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="2" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="3" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
</Sector>
<Sector sectorNumber="2">
<Cell cellNumber="1" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="2" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="3" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
</Sector>
<Sector sectorNumber="3">
<Cell cellNumber="1" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="2" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="3" cellCreated="YES" cellIdentity="" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
</Sector>
> cat file2
Cell11="42921"
Cell12="42925"
Cell13="42928"
Cell21="42922"
Cell22="42926"
Cell23="42929"
Cell31="42923"
Cell32="42927"
Cell33="42920"
> awk 'FNR==NR{FS="=";a[NR]=$2;next}/cell/{c++;FS=OFS;$4="cellIdentity="a[c];}1' file2 file1
<Sector sectorNumber="1">
<Cell cellNumber="1" cellCreated="YES" cellIdentity="42921" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="2" cellCreated="YES" cellIdentity="42925" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="3" cellCreated="YES" cellIdentity="42928" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
</Sector>
<Sector sectorNumber="2">
<Cell cellNumber="1" cellCreated="YES" cellIdentity="42922" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="2" cellCreated="YES" cellIdentity="42926" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="3" cellCreated="YES" cellIdentity="42929" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
</Sector>
<Sector sectorNumber="3">
<Cell cellNumber="1" cellCreated="YES" cellIdentity="42923" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="2" cellCreated="YES" cellIdentity="42927" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
<Cell cellNumber="3" cellCreated="YES" cellIdentity="42920" cellRange="35000" numberOfTxBranches="1" hsCodeResourceId="0"/>
</Sector>
>
答案 1 :(得分:1)
我建议使用支持XML而不是awk和bash脚本的编程语言来执行此操作。
例如,Python。代码可能会稍微长一点,但作为交换,它不会轻易破坏您的XML文件。
因此,如果您想按照它们在文本文件中出现的顺序逐行分配ID:
import re
from xml.dom import minidom
from itertools import izip
sector_doc = minidom.parse('sectors.xml')
cells = sector_doc.getElementsByTagName('Cell')
with open('cells.txt', 'r') as cell_file:
lines = cell_file.readlines()
for line, cell in izip(lines, cells):
m = re.search('Cell\d+="([^"]+)"', line)
if m: cell.setAttribute('cellIdentity', m.group(1))
with open('sectors_out.xml', 'wb') as out_file:
sector_doc.writexml(out_file)