Question

我的文件大小约为500MB，每行都会有如下所述的数据。

#vim results.txt
{"count": 8, "time_first": 1450801456, "record": "A", "domain": "api.ai.", "ip": "54.240.166.223", "time_last": 1458561052}
{"count": 9, "time_first": 1450801456, "record": "A", "domain": "cnn.com.", "ip": "54.240.166.223", "time_last": 1458561052}
 .........

总计2500万行。

现在，我想将results.txt文件保留为

8,1450801456,A,api.ai,54.240.166.223,1458561052
9,1450801456,A,cnn.com,54.240.166.223,1458561052
....

删除不需要的字符串，如count，time_first，record，domain，ip，time_last。

现在，在vim模式下，我删除了每个字符串。例如，我会%s/{"count": //g。

对于一个字符串，需要更多时间来替换它。

我是Bash / shell的初学者，如何使用sed / awk执行此操作？有什么建议吗？

Answer 1

使用sed：

sed -E 's/[{ ]*"[^"]*": *|["}]//g' file
#    ^    ^    ^         ^^---- remaining double quotes and the closing bracket
#    |    |    |         '----- OR
#    |    |    '--------------- key enclosed between double quotes
#    |    '-------------------- leading opening curly bracket and spaces
#    '------------------------- use ERE (Extended Regular Expression) syntax

其他方式：使用包含json解析器的xidel：

xidel -q file -e '$json/*' | sed 'N;N;N;N;N;y/\n/,/'
#     ^           ^     ^         ^         ^---- translate newlines to commas
#     |           |     |         '-------------- append the next five lines
#     |           |     '------------------------ all values
#     |           '------------------------------ for each json string
#     '------------------------------------------ quiet mode

来自@BeniBela的更短的方式，不需要sed一起加入字段：

xidel -q file -e '$json/join(*,",")'

Answer 2

需要考虑的事项：

$ awk -F'[{}":, ]+' -v OFS=, '{for (i=3;i<NF;i+=2) printf "%s%s", $i, (i<(NF-1)?OFS:ORS)}' file
8,1450801456,A,api.ai.,54.240.166.223,1458561052
9,1450801456,A,cnn.com.,54.240.166.223,1458561052

获取由Arnold Robbins撰写的Effective Awk Programming，4th Edition。

Bash：vim模式下的文件大小处理问题

2 个答案: