Question

假设我们在文件中存储一行文本：

df$t

我想要的是这个特定的输入提取 3个条目：

// In the actual file this will be one line
{unrelated_text1,ID:13, unrelated_text2,TIMESTAMP:1476280500,unrelated_text3},   
{other_unrelated_text1,other_unrelated_text2,ID:25,TIMESTAMP:1476280600},
{ID:30,more_unrelated_text1,TIMESTAMP:1476280700},
{ID:40,final_unrelated_text}

到目前为止，我找到的最接近的命令是

// The details, such as whether to put { character in front or not do not matter.
// Any form of output which extracts only these 3 entries and groups them in a 
// visually nice way will do the job.
{ID:13, TIMESTAMP:1476280500}
{ID:25, TIMESTAMP:1476280600}
{ID:30, TIMESTAMP:1476280700}
// I do not want the last entry, because it does not contain timestamp field.

给出输出

grep -Po {ID:[0-9]+(.+?)} input_file

我要搜索的下一个改进是如何从每个条目中删除{unrelated_text1,ID:13,unrelated_text2,TIMESTAMP:1476280500,unrelated_text3} {other_unrelated_text1,other_unrelated_text2,ID:25,TIMESTAMP:1476280600} {ID:30,more_unrelated_text1,TIMESTAMP:1476280700} {ID:40,final_unrelated_text}并删除最后一个条目。

问题：在Linux中最简单的方法是什么？

Answer 1

使用GNU awk实现多字符RS和RT和字边界：

$ awk -v RS='\\<(ID|TIMESTAMP):[0-9]+' 'NR%2{id=RT;next} RT{printf "{%s, %s}\n", id, RT}' file
{ID:13, TIMESTAMP:1476280500}
{ID:25, TIMESTAMP:1476280600}
{ID:30, TIMESTAMP:1476280700}

无论输入是在一行还是多行，并且无论文件中有哪些其他文本，上述内容都会有效，所有它依赖的是每个相关TIMESTAMP之前出现的ID，而不是如有必要，很难改变。

Linux中提取模式字符串和后续模式字符串的简短方法是什么？

1 个答案: