假设我有一个可预测的文本文档,其中包含一些称为X:
的ID和已知的属性组合,例如:具有已知实例数的类别Y:
(例如系列中每个Y:
后始终只有1 X:
):
X:37
# more data pertaining to item 37
Y:BLUE
# more serialized data items including exactly 1 occurrence of "Y:" per preceding "X:"
我想检索所有蓝色物品的物品ID列表。我不在乎是否有重复的ID,只是文档中有哪些ID值。然后我想对列表进行排序,并与具有完全相同结构的另一个结构化文本doc中的蓝色ID列表进行比较("两个文档共有哪些蓝色内容?""哪些蓝色内容在doc 1中但在doc 2中没有?")。
我知道我可以grep
轻松地查找所有Y:BLUE
行,但是我需要在每个此类实例中找到前一个X:
所需的附加命令,并传递排序结果列表到diff
?自从AmiShell以来,我还没有集中使用命令行......对不起:-(是否有在线使用此类用例的食谱?
答案 0 :(得分:0)
让我们考虑您有以下2个输入文件:
$ more doc*
::::::::::::::
doc1
::::::::::::::
doc 1
X:1
# more data pertaining to item 37
# more data pertaining to item 37
# more data pertaining to item 37
Y:BLUE
# more serialized data items including exactly 1 occurrence of "Y:" per preceding "X:"
# more serialized data items including exactly 1 occurrence of "Y:" per preceding "X:"
# more serialized data items including exactly 1 occurrence of "Y:" per preceding "X:"
X:2
# more data pertaining to item 37
# more data pertaining to item 37
Y:BLUE
# more serialized data items including exactly 1 occurrence of "Y:" per preceding "X:"
# more serialized data items including exactly 1 occurrence of "Y:" per preceding "X:"
X:3
# more data pertaining to item 37
# more data pertaining to item 37
Y:RED
# more serialized data items including exactly 1 occurrence of "Y:" per preceding "X:"
# more serialized data items including exactly 1 occurrence of "Y:" per preceding "X:"
X:4
# more data pertaining to item 37
# more data pertaining to item 37
Y:BLUE
# more serialized data items including exactly 1 occurrence of "Y:" per preceding "X:"
# more serialized data items including exactly 1 occurrence of "Y:" per preceding "X:"
::::::::::::::
doc2
::::::::::::::
doc 2
X:4
# more data pertaining to item 37
# more data pertaining to item 37
# more data pertaining to item 37
Y:BLUE
# more serialized data items including exactly 1 occurrence of "Y:" per preceding "X:"
# more serialized data items including exactly 1 occurrence of "Y:" per preceding "X:"
# more serialized data items including exactly 1 occurrence of "Y:" per preceding "X:"
X:3
# more data pertaining to item 37
# more data pertaining to item 37
Y:BLUE
# more serialized data items including exactly 1 occurrence of "Y:" per preceding "X:"
# more serialized data items including exactly 1 occurrence of "Y:" per preceding "X:"
X:2
# more data pertaining to item 37
# more data pertaining to item 37
Y:RED
# more serialized data items including exactly 1 occurrence of "Y:" per preceding "X:"
# more serialized data items including exactly 1 occurrence of "Y:" per preceding "X:"
X:1
# more data pertaining to item 37
# more data pertaining to item 37
Y:BLUE
# more serialized data items including exactly 1 occurrence of "Y:" per preceding "X:"
# more serialized data items including exactly 1 occurrence of "Y:" per preceding "X:"
您可以在每个文档上使用以下awk
命令来获取ID:
$ awk -F':' '/X:[0-9]+$/{tmp=$2}/Y:BLUE$/{a[NR]=tmp}END{asort(a); for(i in a){print a[i]}}' doc1
1
2
4
$ awk -F':' '/X:[0-9]+$/{tmp=$2}/Y:BLUE$/{a[NR]=tmp}END{asort(a); for(i in a){print a[i]}}' doc2
1
3
4
说明:
-F':'
将:
定义为字段分隔符:/X:[0-9]+$/{tmp=$2}
会在tmp
变量中保存ids的值(假设ids只由数字组成,而且线上没有其他内容),如果你不是这样的话可以调整过滤正则表达式/X:[0-9]+$/
以满足您的需求/Y:BLUE$/{a[NR]=tmp}
当我们到达具有模式Y:BLUE
的行(假设:EOL紧跟BLUE
之后)时,我们将保存在tmp中的值添加到数组中awk
awk -F':' '/X:[0-9]+$/{tmp=$2}/Y:BLUE$/{print tmp}' | sort -n
命令
然后您可以按以下方式将它们组合起来,找出两个文档之间蓝色ID的区别:
$ diff <(awk -F':' '/X:[0-9]+$/{tmp=$2}/Y:BLUE$/{a[NR]=tmp}END{asort(a); for(i in a){print a[i]}}' doc1) <(awk -F':' '/X:[0-9]+$/{tmp=$2}/Y:BLUE$/{a[NR]=tmp}END{asort(a); for(i in a){print a[i]}}' doc2)
2c2
< 2
---
> 3
或找到它们之间共同的蓝色ID:
$ comm -1 -2 <(awk -F':' '/X:[0-9]+$/{tmp=$2}/Y:BLUE$/{a[NR]=tmp}END{asort(a); for(i in a){print a[i]}}' doc1) <(awk -F':' '/X:[
0-9]+$/{tmp=$2}/Y:BLUE$/{a[NR]=tmp}END{asort(a); for(i in a){print a[i]}}' doc2)
1
4