Newline分隔的JSON格式需要解析过滤器

时间:2016-11-17 17:46:48

标签: jquery json parsing google-bigquery jq

当我尝试将API流程自动化为BigQuery时,我遇到了问题。

问题是我需要将数据以换行符分隔的JSON格式存入我的BigQuery数据库,但我拉的数据不会这样做,所以我需要解析它。

Here is a link to pastebin so you can get an idea of what the data looks like,但这也只是因为:

TenantMgtAdminService

这两个问题是第一行:

{"type":"user.list","users":[{"type":"user","id":"581c13632f25960e6e3dc89a","user_id":"ieo2e6dtsqhiyhtr","anonymous":false,"email":"test@gmail.com","name":"Joe Martinez","pseudonym":null,"avatar":{"type":"avatar","image_url":null},"app_id":"b5vkxvop","companies":{"type":"company.list","companies":[]},"location_data":{"type":"location_data","city_name":"Houston","continent_code":"NA","country_name":"United States","latitude":29.7633,"longitude":-95.3633,"postal_code":"77002","region_name":"Texas","timezone":"America/Chicago","country_code":"USA"},"last_request_at":1478235114,"last_seen_ip":"66.87.120.30","created_at":1478234979,"remote_created_at":1478234944,"signed_up_at":1478234944,"updated_at":1478235145,"session_count":1,"social_profiles":{"type":"social_profile.list","social_profiles":[]},"unsubscribed_from_emails":false,"user_agent_data":"Mozilla/5.0 (Linux; Android 6.0.1; SM-G920P Build/MMB29K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.68 Mobile Safari/537.36","tags":{"type":"tag.list","tags":[]},"segments":{"type":"segment.list","segments":[{"type":"segment","id":"57d2ea275bfcebabd516d963"},{"type":"segment","id":"57d2ea265bfcebabd516d962"}]},"custom_attributes":{"claimCount":"1","memberType":"claimant"}},{"type":"user","id":"581c22a19a1dc02c460541df","user_id":"1o3helrdv58cxm7jf","anonymous":false,"email":"test@mail.com","name":"Joe Coleman","pseudonym":null,"avatar":{"type":"avatar","image_url":null},"app_id":"b5vkxvop","companies":{"type":"company.list","companies":[]},"location_data":{"type":"location_data","city_name":"San Jose","continent_code":"NA","country_name":"United States","latitude":37.3394,"longitude":-121.895,"postal_code":"95141","region_name":"California","timezone":"America/Los_Angeles","country_code":"USA"},"last_request_at":1478239113,"last_seen_ip":"216.151.183.47","created_at":1478238881,"remote_created_at":1478238744,"signed_up_at":1478238744,"updated_at":1478239113,"session_count":1,"social_profiles":{"type":"social_profile.list","social_profiles":[]},"unsubscribed_from_emails":false,"user_agent_data":"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0","tags":{"type":"tag.list","tags":[]},"segments":{"type":"segment.list","segments":[{"type":"segment","id":"57d2ea275bfcebabd516d963"},{"type":"segment","id":"57d2ea265bfcebabd516d962"}]},"custom_attributes":{"claimCount":"2","memberType":"claimant"}}],"scroll_param":"24ba0fac-b8f9-46b2-944a-9bb523dcd1b1"}

最后一部分在底部:

{"type":"user.list","users":

如果你消除了这两个,你只需要留下所需的必要数据,我知道需要什么过滤器来解析它以换行符分隔格式。

You can see for yourself by playing around with this tool,但如果您只复制并粘贴从第一个开放式括号到最后一行的近括号的所有内容,请将其设置为" Compact Output"并应用过滤器:

,"scroll_param":"24bd0rac-b2f9-46b2-944a-9zz543dcd1b1"}

结果将与您在此处看到的结果in a nice and neat newline delimited format like you see here.一样,此处也不在链接中:

.[]

所以我需要的是一个过滤器,我可以按照我使用的相同方式应用。[]在第一个开括号之前拉出所有文本(如上所述)以及关闭之前的所有文本支架在最后。

但是这里出现了最后的问题。虽然我需要最后一段文字,但我仍然需要那些被称为滚动参数的字母和数字。这是因为为了完全捕获API中我需要的所有数据,我需要继续使用它从命令行调用生成的新滚动参数,直到所有数据都在。

初始调用如下:

{"type":"user","id":"581c13632f25960e6e3dc89a","user_id":"ieo2e6dtsqhiyhtr","anonymous":false,"email":"test@gmail.com","name":"Joe Martinez","pseudonym":null,"avatar":{"type":"avatar","image_url":null},"app_id":"b5vkxvop","companies":{"type":"company.list","companies":[]},"location_data":{"type":"location_data","city_name":"Houston","continent_code":"NA","country_name":"United States","latitude":29.7633,"longitude":-95.3633,"postal_code":"77002","region_name":"Texas","timezone":"America/Chicago","country_code":"USA"},"last_request_at":1478235114,"last_seen_ip":"66.87.120.30","created_at":1478234979,"remote_created_at":1478234944,"signed_up_at":1478234944,"updated_at":1478235145,"session_count":1,"social_profiles":{"type":"social_profile.list","social_profiles":[]},"unsubscribed_from_emails":false,"user_agent_data":"Mozilla/5.0 (Linux; Android 6.0.1; SM-G920P Build/MMB29K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.68 Mobile Safari/537.36","tags":{"type":"tag.list","tags":[]},"segments":{"type":"segment.list","segments":[{"type":"segment","id":"57d2ea275bfcebabd516d963"},{"type":"segment","id":"57d2ea265bfcebabd516d962"}]},"custom_attributes":{"claimCount":"1","memberType":"claimant"}}
{"type":"user","id":"581c22a19a1dc02c460541df","user_id":"1o3helrdv58cxm7jf","anonymous":false,"email":"test@mail.com","name":"Joe Coleman","pseudonym":null,"avatar":{"type":"avatar","image_url":null},"app_id":"b5vkxvop","companies":{"type":"company.list","companies":[]},"location_data":{"type":"location_data","city_name":"San Jose","continent_code":"NA","country_name":"United States","latitude":37.3394,"longitude":-121.895,"postal_code":"95141","region_name":"California","timezone":"America/Los_Angeles","country_code":"USA"},"last_request_at":1478239113,"last_seen_ip":"216.151.183.47","created_at":1478238881,"remote_created_at":1478238744,"signed_up_at":1478238744,"updated_at":1478239113,"session_count":1,"social_profiles":{"type":"social_profile.list","social_profiles":[]},"unsubscribed_from_emails":false,"user_agent_data":"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0","tags":{"type":"tag.list","tags":[]},"segments":{"type":"segment.list","segments":[{"type":"segment","id":"57d2ea275bfcebabd516d963"},{"type":"segment","id":"57d2ea265bfcebabd516d962"}]},"custom_attributes":{"claimCount":"2","memberType":"claimant"}}

但是为了获得所有信息,我需要滚动参数来进行单独调用:

$ curl -s https://api.program.io/users/scroll -u 'dG9rOmU5NGFjYTkwXzliNDFfNGIyMF9iYzA0XzU0NDg3MjE5ZWJkZDoxOjA=': -H 'Accept:application/json'

因此,虽然我需要删除包含参数的blob中的文本以便将其置于换行符分隔格式,但我仍然需要提取该参数的任何内容以循环回另一个将继续运行的脚本直到它是空的。

很想听到解决这个问题的任何建议!

2 个答案:

答案 0 :(得分:0)

与其他发表评论的人一样,我不会假装了解具体问题的细节,但如果一般的问题是如何使用jq发出换行符分隔的JSON(即确保每个JSON)文本之后是换行符,并且没有添加其他(原始)换行符),答案很简单:使用带有-c选项的jq,而不使用-r选项。

答案 1 :(得分:0)

粗略检查您的数据,过滤器

.users[]

将只为您提供要加载的用户数据和过滤器

.scroll_param

将仅返回scroll参数。如果将数据放入文件中,则可以为每个过滤器调用一次jq,但如果必须流式传输数据,则只需使用,运算符即可返回一个值。 e.g。

  .scroll_param
, .users[]

如果您使用该过滤器和-c选项,jq将生成类似

的输出
"24ba0fac-b8f9-46b2-944a-9bb523dcd1b1"
{"type":"user","id":"581c13632f25960e6e3dc89a","user_id":"ieo2e6dtsqhiyhtr",...
{"type":"user","id":"581c22a19a1dc02c460541df","user_id":"1o3helrdv58cxm7jf",...

可能是从jq读取输出的脚本可以捕获在curl调用中使用的第一行,并将其余数据放入您加载的文件中。

希望这有帮助。