Question

我知道将stdin传递给shell中的下游进程的基础知识，只要每行单独处理，或者作为单个输入，我就可以使我的管道工作。

但是当我想阅读4行stdin，做一些处理，再读6行，并且做同样的事情时，我对管道的有限理解成为一个问题。

例如，在下面的管道中，每个curl调用都会生成一个未知数量的输出行，构成一个JSONObject：

cat geocodes.txt \
  | xargs  -I% -n 1 curl -s 'http://maps.googleapis.com/maps/api/geocode/json?latlng='%'&sensor=true' \
  | python -c "import json,sys;obj=json.load(sys.stdin);print obj['results'][0]['address_components'][3]['short_name'];"

如何每python次调用只使用一个JSONObject？注意我实际上在Python中的经验可以忽略不计。我实际上对Node.js有更多的经验（使用Node.js处理JSON卷曲输出会更好吗？）

Geocodes.txt类似于：

51.5035705555556,-3.15153263888889
51.5035400277778,-3.15153477777778
51.5035285833333,-3.15150258333333
51.5033861111111,-3.15140833333333
51.5034980555556,-3.15146016666667
51.5035285833333,-3.15155505555556
51.5035362222222,-3.15156338888889
51.5035362222222,-3.15156338888889

修改我有一种讨厌的感觉，答案是你需要逐行阅读并在解析之前检查你是否有一个完整的对象。有没有能为我付出努力的功能？

Answer 1

我相信这种方法可以达到你想要的效果。首先，将python脚本保存在文件my_script.py中。然后执行以下操作：

cat geocodes.txt \
  | xargs  -I% sh -c "curl -s 'http://maps.googleapis.com/maps/api/geocode/json?latlng='%'&sensor=true' | python my_script.py"

my_script.py的位置是：

import json,sys;obj=json.load(sys.stdin);print obj['results'][0]['address_components'][3]['short_name'];

输出：

Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff

看起来有点hacky，我承认。

原始答案

我不是bash向导，所以我的直觉就是用Python完成所有事情。以下脚本将在Python 3中说明该方法：

import urllib.request as request
import urllib.parse as parse
import json

serviceurl = "http://maps.googleapis.com/maps/api/geocode/json?"

with open("geocodes.txt") as f:
    for line in f:
        url = (serviceurl +
               parse.urlencode({'latlng':line, 'sensor':'true'}))
        with request.urlopen(url) as response:
            bytes_data = response.read()
        obj = json.loads(bytes_data.decode('utf-8'))
        print(obj['results'][0]['address_components'][3]['short_name'])

输出：

Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff
Cardiff

Answer 2

看看：

http://trentm.com/json/#FEATURE-Grouping

Grouping can be helpful for "one JSON object per line" formats or for things such as:

$ cat *.json | json -g ...

安装：

sudo npm install -g json

我自己没有尝试过，所以无法验证它是否有效，但可能缺少链接来做你想做的事情（Group JSON）

Answer 3

你不需要python或node.js. sudo apt-get install jq专为json过滤UNIX样式而设计：

cat geocodes.txt  \
  | xargs  -I% curl -s 'http://maps.googleapis.com/maps/api/geocode/json?latlng='%'&sensor=true'  \
  | jq --unbuffered '.results[0].formatted_address'

然后：

find -iname "**jpg" \
  | xargs -n 1 -d'\n' exiftool -q -n -p '$GPSLatitude,$GPSLongitude' 
  | xargs  -I% curl -s 'http://maps.googleapis.com/maps/api/geocode/json?latlng='%'&sensor=true'  
  | jq --unbuffered  '.results[0].formatted_address'

或者，如果您想对所有JPG文件执行此操作：

import java.util.*;
public class Payroll
{
    static Scanner key = new Scanner(System.in);
    public Payroll()
    {
        System.out.print("Name: ");
        String name = key.next();

        System.out.print("Hours worked this week: ");
        int hoursWorked = key.nextInt();
        System.out.print("Hourly rate: ");
        double payRate = key.nextDouble();

        double payPreTax = hoursWorked * payRate;

        System.out.print("Federal tax withhold: ");
        String fedTaxStr = key.next().replace("%", "");
        double fedTax = ((Double.parseDouble(fedTaxStr)) / 100) * payPreTax;

        System.out.print("State tax withold: ");
        String stateTaxStr = key.next().replace("%", "");
        double stateTax = ((Double.parseDouble(stateTaxStr)) / 100) * payPreTax;

        double amountWithheld = fedTax + stateTax;

        double payPostTax = payPreTax - amountWithheld;

        System.out.printf("\nSummary\n\nEmployee: " + name + "\nGross Pay: %.2f\nFederal Withholding: %.2f\nState Withholding: %.2f\nTotal Deduction: %.2f\nNet Pay: %.2f", payPreTax, fedTax, stateTax, amountWithheld, payPostTax);
    }
    public static void main(String[] args)
    {
        new Payroll();
    }

如何将多行JSON对象传递到单独的python调用中

3 个答案: