json数组中的按位运算

时间:2017-08-10 14:47:43

标签: arrays json bash shell jq

我有一个json文件input.json,其数据格式如下:

{"userid":"04f","clients":[1,2]}
{"userid":"07f","clients":[1,6,7]}
{"userid":"082","clients":[2,6,1]}
{"userid":"0c1","clients":[3,9,8]}
{"userid":"13f","clients":[4]}

clients数组可以包含1-10的数字,可能有多个元素但没有重复。我想对这个文件执行按位操作。

我期待这样的输出(对于客户端数组元素的按位OR运算):

{"userid":"04f","clients":3}  #$((1|2))=3
{"userid":"07f","clients":7}  #$((1|6|7))=7
{"userid":"082","clients":7}  #$((1|6|2))=7
{"userid":"0c1","clients":11} #$((3|9|8))=11
{"userid":"13f","clients":4}  #$((4))=4

我的文件大约有2.5亿行。我在寻找bash的解决方案。什么是实现这一目标的最快和最佳方式?

4 个答案:

答案 0 :(得分:1)

不幸的是jq还不支持按位操作。我建议写一个小的Python程序:

from collections import OrderedDict
from functools import reduce
import json

with open('file.json', 'r') as fd:
    for line in fd:
        data = json.loads(line, object_pairs_hook=OrderedDict)
        data['clients'] = reduce(lambda x,y : x|y, data['clients'])
        print(json.dumps(data))

输出:

{"userid": "04f", "clients": 3}
{"userid": "07f", "clients": 7}
{"userid": "082", "clients": 7}
{"userid": "0c1", "clients": 11}
{"userid": "13f", "clients": 4}

答案 1 :(得分:1)

以下内容基于https://rosettacode.org/wiki/Non-decimal_radices/Convert#jq处提供的两个通用过滤器(convert/1to_i/1) 它们的定义包括在下面,以便于完整性和易于参考。

# input: an array of decimal numbers
def bitwise_or:
   map(convert(2) | explode | reverse | map(.-48))
   | transpose | map(max)
   | reverse
   | join("")
   | to_i(2) ;

.clients |= bitwise_or

convert和to_i

# Convert the input integer to a string in the specified base (2 to 36 inclusive)
def convert(base):
  def stream:
    recurse(if . > 0 then ./base|floor else empty end) | . % base ;
  if . == 0 then "0"
  else  [stream] | reverse | .[1:]
  | if   base <  10 then map(tostring) | join("")
    elif base <= 36 then map(if . < 10 then 48 + . else . + 87 end) | implode
    else error("base too large")
    end
  end;

# input string is converted from "base" to an integer, within limits
# of the underlying arithmetic operations, and without error-checking:
def to_i(base):
  explode
  | reverse
  | map(if . > 96  then . - 87 else . - 48 end)  # "a" ~ 97 => 10 ~ 87
  | reduce .[] as $c
      # state: [power, ans]
      ([1,0]; (.[0] * base) as $b | [$b, .[1] + (.[0] * $c)])
  | .[1];

答案 2 :(得分:0)

一种方式(因为你说bash)将使用awk

tr -d "[]}" <input.json | awk -F ":" '{split($3,a,",") ;o=0;for (i in a)  {o = or(o,a[i])};print $1":"$2":"o"}"  }' 

awk具有按位OR功能 - 用作or(arg1,arg2,..argn)

tr -d "[]}"用于在执行操作之前消除额外字符。

split()将分隔符(,)分隔值存储到数组中。

这给出了:

{"userid":"04f","clients":3}                                                                                                                                             
{"userid":"07f","clients":7}                                                                                                                                             
{"userid":"082","clients":7}                                                                                                                                             
{"userid":"0c1","clients":11}                                                                                                                                            
{"userid":"13f","clients":4}    

注意:这可能不适用于其他一些json格式。

答案 3 :(得分:0)

这是一个jq解决方案。 Project中的常量128可以更改为对数据有意义的任何值(或者甚至可以用返回常量流的简单函数替换它)

twopowers

再考虑一下,我们可以通过使用def twopowers: # return sequence of powers of 2 128 # largest power (change as desired) | log2 as $maxp # e.g. 7 | $maxp - range($maxp+1) # 7, 6, 5, 4, 3, 2, 1, 0 | pow(2; .) # 128, 64, 32, 16, 8, 4, 2, 1 ; def base2powers: # e.g 81 -> [0,64,0,16,0,0,0,1] [ foreach twopowers as $p ( { v: . } ; .diff = .v - $p | .v = if .diff >= 0 then .diff else .v end | .bit = if .diff >= 0 then 1 else 0 end ; .bit * $p ) ] ; def combine: # given an array of base2powers arrays reduce .[] as $a ( # compute the element-wise max array [] # and return its sum ; [ . as $b | $a | range(length) | [ $a[.], $b[.] ] | max ] ) | add ; .clients = (.clients | map(base2powers) | combine) 数组中的最大值来计算每个输入使用的功率,从而消除twopowers中的常量。这是一个执行此操作的版本。

.clients

Nishant Kumar观察到def twopowers_v2: # return sequence of powers of 2 less than given value . # e.g. 129 | log2 # 7.011227255423254 | floor as $maxp # 7 | $maxp - range($maxp+1) # 7, 6, 5, 4, 3, 2, 1, 0 | pow(2; .) # 128, 64, 32, 16, 8, 4, 2, 1 ; def base2powers_v2($powers): # e.g 81 -> [64,0,16,0,0,0,1] [ foreach $powers[] as $p ( { v: . } ; .diff = .v - $p | .v = if .diff >= 0 then .diff else .v end | .pow = if .diff >= 0 then $p else 0 end ; .pow ) ] ; .clients = ( .clients | [max|twopowers_v2] as $powers | map(base2powers_v2($powers)) | combine ) .clients,最终结果为[0]。这是因为null不返回任何值。为了弥补这一点,我们可以添加一个明确的检查:

0 | twopowers_v2

peak's second solution我注意到两件事:

  • def twopowers_v3: # return sequence of powers of 2 less than given value if . > 0 then # e.g. 129 log2 # 7.011227255423254 | floor as $maxp # 7 | $maxp - range($maxp+1) # 7, 6, 5, 4, 3, 2, 1, 0 | pow(2; .) # 128, 64, 32, 16, 8, 4, 2, 1 else # 0 # but if input is 0, return 0 end # ; .clients = ( .clients | [max|twopowers_v3] as $powers | map(base2powers_v2($powers)) | combine ) combine
  • 相同
  • elementwise(max) | addelementwise(max)
  • 相同

以下是没有transpose | map(max)

的版本
combine

还使用&#34; little-endian&#34;比特数组的表示比这种方法更简单。