我有一个非常大的文件,其条目如下所示:
{
"_id": {
"$oid": "572a5b93ae5174d3c4177da3"
},
"email": "removed@gmail.com",
"gender": "F",
"zip": "32934",
"state": "FL",
"city": "EAU GALLIE",
"address1": "removed",
"last_name": "removed",
"first_name": "removed",
"updatedAt": {
"$date": "2016-05-04T20:29:02.061Z"
},
"__v": 0,
"createdAt": {
"$date": "2016-05-04T20:28:54.948Z"
}
}
{
"_id": {
"$oid": "57a49bed913aebc7257145b9"
},
"email": "removed@gmail.com",
"dob": "11/06/1996",
"gender": "F",
"zip": "SN14 8BZ",
"address1": "removed",
"last_name": "removed",
"first_name": "removed",
"updatedAt": {
"$date": "2016-08-16T23:53:30.161Z"
},
"__v": 0,
"createdAt": {
"$date": "2016-08-05T14:00:13.130Z"
}
}
{
"_id": {
"$oid": "57a49bed913aebc7257145d3"
},
"email": "removed@netzero.net",
"zip": "NULL",
"state": "NULL",
"city": "NULL",
"address1": "NULL",
"last_name": "removed",
"first_name": "removed",
"updatedAt": {
"$date": "2016-08-05T14:00:13.467Z"
},
"__v": 0,
"createdAt": {
"$date": "2016-08-05T14:00:13.467Z"
}
}
{
"_id": {
"$oid": "57ab71379f7474b50eef976d"
},
"updatedAt": {
"$date": "2016-08-16T23:40:55.851Z"
},
"createdAt": {
"$date": "2016-08-10T18:23:51.177Z"
},
"email": "removed@hotmail.co.uk",
"ip": "0.0.0.0",
"first_name": "removed",
"last_name": "removed",
"address1": "removed",
"city": "",
"state": "",
"zip": "removed",
"gender": "F",
"__v": 0,
"dob": "03/01/1973"
}
{
"_id": {
"$oid": "57ab7137913aebc725194a20"
},
"email": "removed@gmail.com",
"job": "DeliveryDriver",
"zip": "24401",
"state": "VA",
"city": "FISHERSVILLE",
"updatedAt": {
"$date": "2016-09-16T12:45:50.984Z"
},
"__v": 0,
"createdAt": {
"$date": "2016-08-10T18:23:50.813Z"
},
"gender": "M",
"last_name": "removed",
"first_name": "removed"
}
并且它没有特定的顺序,我显然删除了名称,地址,IP和电子邮件以保护隐私。但是线路已经全部结束,其中超过20M。
我如何正确解析这个问题?我期待只提取电子邮件,IP,电话号码,姓名(第一个和最后一个)和地址(Zip,地址1,地址2,城市)
其中一些行只有电子邮件和IP,有的有电子邮件,IP,名称,还有一些有电子邮件,名称,地址等,包括一些所有行(它们都有一些垃圾数据,如OID,创建和更新日期,性别等)
解析此问题的最佳方法是什么?我已经尝试了一段时间,我知道它已经完成了,谢谢!
答案 0 :(得分:0)
它的跨平台。
示例,根据您的需要调整命令:
$ jq '(.email, .first_name, .last_name)' file.json
输出:
"removed@gmail.com"
"removed"
"removed"
"removed@gmail.com"
"removed"
"removed"
"removed@netzero.net"
"removed"
"removed"
"removed@hotmail.co.uk"
"removed"
"removed"
"removed@gmail.com"
"removed"
"removed"