Question

我们正在尝试将JSON文件解析为tsv文件。我们在尝试消除具有唯一性的重复ID时遇到问题。

JSON文件

[
   {"Id": "101",
    "Name": "Yugi"},   
   {"Id": "101",
    "Name": "Yugi"},
   {"Id": "102",
    "Name": "David"},      
]

cat getEvent_all.json | jq -cr '.[] | [.Id] | unique_by(.[].Id)'

jq：错误（在：0处）：无法遍历字符串（“ 101”）

Answer 1

一种合理的方法是使用unique_by，例如：

unique_by(.Id)[]
| [.Id, .Name]
| @tsv

或者，您可以先形成一对：

map([.Id, .Name])
| unique_by(.[0])[]
| @tsv

`uniques_by/2`

但是，对于非常大的数组，或者如果您想遵守原始顺序，则应考虑使用unique_by的无排序替代方案。这是一个合适的，通用的，面向流的替代方法：

def uniques_by(stream; f):
  foreach stream as $x ({};
     ($x|f) as $s
     | ($s|type) as $t
     | (if $t == "string" then $s
        else ($s|tostring) end) as $y
     | if .[$t][$y] then .emit = false
       else .emit = true | (.item = $x) | (.[$t][$y] = true)
       end;
     if .emit then .item else empty end );

jq：错误（在<stdin>：0处）：无法遍历字符串，无法执行唯一问题

1 个答案:

`uniques_by/2`