假设我们在文件中存储一行文本:
df$t
我想要的是这个特定的输入提取 3个条目:
// In the actual file this will be one line
{unrelated_text1,ID:13, unrelated_text2,TIMESTAMP:1476280500,unrelated_text3},
{other_unrelated_text1,other_unrelated_text2,ID:25,TIMESTAMP:1476280600},
{ID:30,more_unrelated_text1,TIMESTAMP:1476280700},
{ID:40,final_unrelated_text}
到目前为止,我找到的最接近的命令是
// The details, such as whether to put { character in front or not do not matter.
// Any form of output which extracts only these 3 entries and groups them in a
// visually nice way will do the job.
{ID:13, TIMESTAMP:1476280500}
{ID:25, TIMESTAMP:1476280600}
{ID:30, TIMESTAMP:1476280700}
// I do not want the last entry, because it does not contain timestamp field.
给出输出
grep -Po {ID:[0-9]+(.+?)} input_file
我要搜索的下一个改进是如何从每个条目中删除{unrelated_text1,ID:13,unrelated_text2,TIMESTAMP:1476280500,unrelated_text3}
{other_unrelated_text1,other_unrelated_text2,ID:25,TIMESTAMP:1476280600}
{ID:30,more_unrelated_text1,TIMESTAMP:1476280700}
{ID:40,final_unrelated_text}
并删除最后一个条目。
问题:在Linux中最简单的方法是什么?
答案 0 :(得分:1)
使用GNU awk实现多字符RS和RT和字边界:
$ awk -v RS='\\<(ID|TIMESTAMP):[0-9]+' 'NR%2{id=RT;next} RT{printf "{%s, %s}\n", id, RT}' file
{ID:13, TIMESTAMP:1476280500}
{ID:25, TIMESTAMP:1476280600}
{ID:30, TIMESTAMP:1476280700}
无论输入是在一行还是多行,并且无论文件中有哪些其他文本,上述内容都会有效,所有它依赖的是每个相关TIMESTAMP之前出现的ID,而不是如有必要,很难改变。