如果存在ona匹配项,我需要合并2个文件。非静态匹配是随机的,但总是在一个特定标签之后
文件1
<can update="x" site="merge-xml-01" site_id="foo.com" xmltv_id="foo@com">foo@com</can>
<can update="x" site="merge-xml-02" site_id="bar.com" xmltv_id="bar@com">bar@com</can>
<can update="x" site="merge-xml-03" site_id="xxx.com" xmltv_id="xxx@com">xxx@com</can>
文件2
<can offset="u" same_as="foo.com" id="foo 01">foo 01</can>
<can offset="u" same_as="foo.com" id="foo 02">foo 02</can>
<can offset="u" same_as="bar.com" id="bar 01">bar 01</can>
<can offset="u" same_as="xxx.com" id="xxx 01">xxx 01</can>
<can offset="u" same_as="xxx.com" id="xxx 02">xxx 02</can>
<can offset="u" same_as="xxx.com" id="xxx 03">xxx 03</can>
我需要将文件号设置为3
<can update="x" site="merge-xml-01" site_id="foo.com" xmltv_id="foo@com">foo@com</can>
<can offset="u" same_as="foo.com" id="foo 01">foo 01</can>
<can offset="u" same_as="foo.com" id="foo 02">foo 02</can>
<can update="x" site="merge-xml-02" site_id="bar.com" xmltv_id="bar@com">bar@com</can>
<can offset="u" same_as="bar.com" id="bar 01">bar 01</can>
<can update="x" site="merge-xml-03" site_id="xxx.com" xmltv_id="xxx@com">xxx@com</can>
<can offset="u" same_as="xxx.com" id="xxx 01">xxx 01</can>
<can offset="u" same_as="xxx.com" id="xxx 02">xxx 02</can>
<can offset="u" same_as="xxx.com" id="xxx 03">xxx 03</can>
我希望很清楚,如果文件1“ site_id =“中的标签与文件2中的标签“ same_as =”匹配,我需要合并数据。
老实说,我不知道该怎么做才能得到这个结果,我检查了很多帖子,但是所有合并数据都在同一行,我找不到新行中的合并数据。
我喜欢是否可以使用sed或awk,但欢迎提出任何建议。
谢谢你的建议。
答案 0 :(得分:1)
假设file2按键排序
$ awk -F' |=' 'NR==FNR {for(i=1;i<NF;i++) if($i=="site_id") {a[$(i+1)]=$0; break}; next}
{k=""; for(i=1;i<NF;i++) if($i=="same_as") {k=$(i+1); break}
if(!p[k]++) print a[k]}1' file1 file2
<can update="x" site="merge-xml-01" site_id="foo.com" xmltv_id="foo@com">foo@com</can>
<can offset="u" same_as="foo.com" id="foo 01">foo 01</can>
<can offset="u" same_as="foo.com" id="foo 02">foo 02</can>
<can update="x" site="merge-xml-02" site_id="bar.com" xmltv_id="bar@com">bar@com</can>
<can offset="u" same_as="bar.com" id="bar 01">bar 01</can>
<can update="x" site="merge-xml-03" site_id="xxx.com" xmltv_id="xxx@com">xxx@com</can>
<can offset="u" same_as="xxx.com" id="xxx 01">xxx 01</can>
<can offset="u" same_as="xxx.com" id="xxx 02">xxx 02</can>
<can offset="u" same_as="xxx.com" id="xxx 03">xxx 03</can>
ps。这应该比其他大文件解决方案快得多。
答案 1 :(得分:0)
逐行读取文件,找到URL并在第二个文件中搜索。
while read -r line; do
echo "$line" >> file3
url=$(sed 's/.*site_id="\([^"]\+\)".*/\1/' <<< $line)
grep $url file2 >> file3
done < file1
$ cat file3
<can update="x" site="merge-xml-01" site_id="foo.com" xmltv_id="foo@com">foo@com</can>
<can offset="u" same_as="foo.com" id="foo 01">foo 01</can>
<can offset="u" same_as="foo.com" id="foo 02">foo 02</can>
<can update="x" site="merge-xml-02" site_id="bar.com" xmltv_id="bar@com">bar@com</can>
<can offset="u" same_as="bar.com" id="bar 01">bar 01</can>
<can update="x" site="merge-xml-03" site_id="xxx.com" xmltv_id="xxx@com">xxx@com</can>
<can offset="u" same_as="xxx.com" id="xxx 01">xxx 01</can>
<can offset="u" same_as="xxx.com" id="xxx 02">xxx 02</can>
<can offset="u" same_as="xxx.com" id="xxx 03">xxx 03</can>
答案 2 :(得分:0)
IF 您肯定知道这些格式是一致的,并且始终在一行上...
$: cat c $ file 1 is a, file 2 is b
#! /bin/env bash
while read -r line
do pat="${line##* site_id=\"}"
pat="${pat%%\"*}"
echo "$line"
grep " same_as=[\"]$pat[\"] " b
done < a
$: c
<can update="x" site="merge-xml-01" site_id="foo.com" xmltv_id="foo@com">foo@com</can>
<can offset="u" same_as="foo.com" id="foo 01">foo 01</can>
<can offset="u" same_as="foo.com" id="foo 02">foo 02</can>
<can update="x" site="merge-xml-02" site_id="bar.com" xmltv_id="bar@com">bar@com</can>
<can offset="u" same_as="bar.com" id="bar 01">bar 01</can>
<can update="x" site="merge-xml-03" site_id="xxx.com" xmltv_id="xxx@com">xxx@com</can>
<can offset="u" same_as="xxx.com" id="xxx 01">xxx 01</can>
<can offset="u" same_as="xxx.com" id="xxx 02">xxx 02</can>
<can offset="u" same_as="xxx.com" id="xxx 03">xxx 03</can>
答案 3 :(得分:0)
这可能对您有用(GNU sed):
sed 's#.*same_as=\("[^"]*"\).*#/site_id=\1/a&#' file2 | sed -f - file1
将file2转换为sed脚本,该脚本在将same_as
的值与file1的site_id
匹配时追加每一行。然后,将生成的脚本通过管道传递到针对file1运行的sed的第二次调用。每次读入file1中的一行时,file2中的行都会依次附加到该行。
要从file1中删除与file2中不匹配的行,请使用:
sed -e 's#.*same_as=\("[^"]*"\).*#/site_id=\1/{a&\nx;s/^/x/;x}#' file2 |
sed -f - -e 'x;/x/{z;x;b};d' file1
这将在保留空间中添加一个标志,该标志在添加file2的行时和未设置时设置,以从file1删除当前记录。