在构建正则表达式模式时按jq中的值进行过滤

时间:2014-08-23 14:54:43

标签: regex json jq

在尝试创建一个输入版本以提供给我的实际代码时,我需要创建一个文件,该文件能够在一个键的值中“包含”一个字符串区分大小写的字符串。 即构建正则表达式以实现“包含”“camfrog或tubemate,或soundclould”。

示例JSON输入:

{"appid":"537c6d4a9c4846b8bc44ebdf78ab8e2d","app_name":"TubeMate
YouTube Downloader","publisher_id":"1690d6387fcc441091a2f2d73f89709d"}
{"appid":"f8022204aaa7478a88fca1a417ddb125","app_name":"Camfrog
Android Smartphone","publisher_id":"085d0268a9674ce885a2f185ec895246"}
{"appid":"agltb3B1Yi1pbmNyDAsSA0FwcBih9tMUDA","app_name":"TuneIn Radio
- iPad","publisher_id":"agltb3B1Yi1pbmNyEAsSB0FjY291bnQYsv-PFAw"} {"appid":"537c6d4a9c4846b8bc44ebdf78ab8e2d","app_name":"TubeMate
YouTube Downloader","publisher_id":"1690d6387fcc441091a2f2d73f89709d"}
{"appid":"f8022204aaa7478a88fca1a417ddb125","app_name":"Camfrog
Android Smartphone","publisher_id":"085d0268a9674ce885a2f185ec895246"}
 {"appid":"92255b8b662148e59973b8eca128adde","app_name":"SubwaySimulator3D","publisher_id":"0d78f4d244ec4309b4aa06cdfb871341"}
{"appid":"agltb3B1Yi1pbmNyDAsSA0FwcBjq_6EUDA","app_name":"TuneIn
Radio","publisher_id":"agltb3B1Yi1pbmNyEAsSB0FjY291bnQYsv-PFAw"}
{"appid":"f7cc119ca9e1426c8d162d2d37c8558f","app_name":"Android Skout
New","publisher_id":"agltb3B1Yi1pbmNyEAsSB0FjY291bnQY7cCnEgw"}
{"appid":"agltb3B1Yi1pbmNyDAsSA0FwcBim6MAVDA","app_name":"Draw
Something
Android","publisher_id":"agltb3B1Yi1pbmNyEAsSB0FjY291bnQYgYC-FQw"}

从这个Json输入我需要过滤名称“喜欢”Camfrog的应用程序(它可以是CAMFROG,camfrog等等,因此正则表达式必须不区分大小写。对此,我需要输出一系列app_names喜欢说“Camfrog”,“Tubemate”,“soundcloud”等。 我在这里查看了jq手册http://stedolan.github.io/jq/manual/,但无法构建表达式。

这是我试过的 - :

 </home/ekta/Prototype1/sample.dat jq -c '{app_name:.app_name} |
 match(["Camfrog", "ig"])'  
 map(select(.app.name like "%Camfrog%" ))

但我得到的比赛没有定义&amp;编译错误。我怎样才能在Jq。

中做到这一点

后备 - : 我可以在pandas中加载这个数据帧,并在那里进行正则表达式,但由于我的文件有很多我不需要的东西,我想在Jq中快速过滤。

过滤应用程序后的

示例输出(我需要所有的键,值与orignal输出一样 - :

{"appid":"537c6d4a9c4846b8bc44ebdf78ab8e2d","app_name":"TubeMate
YouTube Downloader","publisher_id":"1690d6387fcc441091a2f2d73f89709d"}
{"appid":"f8022204aaa7478a88fca1a417ddb125","app_name":"Camfrog
Android Smartphone","publisher_id":"085d0268a9674ce885a2f185ec895246"}
{"appid":"537c6d4a9c4846b8bc44ebdf78ab8e2d","app_name":"TubeMate
YouTube Downloader","publisher_id":"1690d6387fcc441091a2f2d73f89709d"}
{"appid":"f8022204aaa7478a88fca1a417ddb125","app_name":"Camfrog
Android Smartphone","publisher_id":"085d0268a9674ce885a2f185ec895246"}

PPS:如果您能“教我钓鱼”,而不是仅仅构建应该匹配的正则表达式,我们将不胜感激。

跟进问题 - :

另外,当我尝试测试示例示例时,在jq手册中,如 - :

  

echo [{“foo”:1,“bar”:2},{“foo”:1,“bar”:3},{“foo”:4,“bar”:   5}] | jq'unique(.foo)

我得到,错误:对于唯一(预期0但得到1)唯一的参数(.foo)1编译错误的参数太多

当jq手册读取时,示例如下 - :

jq 'unique(.foo)'
Input [{"foo": 1, "bar": 2}, {"foo": 1, "bar": 3}, {"foo": 4, "bar": 5}]
Output    [{"foo": 1, "bar": 2}, {"foo": 4, "bar": 5}]

我还应该在这里尝试输入吗?

我构建字典的方式确实是</home/ekta/SamplePrototype.dat jq -c '{appid:.app.id,,app_name:.app.name,publisher_id:.app.publisher_id}',但我想像jq手册中那样测试一下。你能不能指点一下我在这里做错了什么?

1 个答案:

答案 0 :(得分:3)

这是我们的老朋友Grep(和egrep)对我有用的东西

$<sample.dat  jq -c '{appid:.appid,app_name:.app_name}'  | egrep -i "camfrog|draw something"
{"appid":"f8022204aaa7478a88fca1a417ddb125","app_name":"Camfrog Android Smartphone"}
{"appid":"f8022204aaa7478a88fca1a417ddb125","app_name":"Camfrog Android Smartphone"}
{"appid":"agltb3B1Yi1pbmNyDAsSA0FwcBim6MAVDA","app_name":"Draw Something Android"}