Question

我正在尝试查找一组文件

> find . -type f -iregex .*geojson$
> ./dir1/london.geojson
  ./manchester.geojson

然后对于找到的每个文件（在许多嵌套文件夹中为30到40），我想在原始文件周围添加自己的json结构，并添加文件名和提取的ID。就像这样：

> cat manchester.geojson
  {"properties": { "id": 11.0, "borough": "Didsbury" }, "geometry": {"removed": 0} }
  {"properties": { "id": 22.0, "borough": "Chorlton" }, "geometry": {"removed": 0} }

我想要以下结果：

{"_id": 11.0, filename": "manchester.geojson", "document": {"properties": { "id": 11.0, "borough": "Didsbury" }, "geometry": {"removed": 0} }}
{"_id": 22.0, filename": "manchester.geojson", "document": {"properties": { "id": 22.0, "borough": "Chorlton" }, "geometry": {"removed": 0} }}

我最接近的是将管道传递给xargs和awk，如下所示：

> find . -type f -iregex .*geojson$ | xargs -d '\n' awk -F'[{:,]' '{print "{ \"_id\":"$7", \"file\": \""FILENAME"\", \"doc\": " $0 " }"}'

  }"_id": 11.0, "file": "./manchester.geojson", "doc": { "type": "Feature", "properties": { "id": 11.0, "borough": "Didsbury" }, "geometry": {"removed": 0} }}
  }"_id": 22.0, "file": "./manchester.geojson", "doc": { "type": "Feature", "properties": { "id": 22.0, "borough": "Chorlton" }, "geometry": {"removed": 0} }}

我不知道打开花括号到底有什么问题？

我可以找到我想要的所有变量，请参见以下示例：

> find . -type f -iregex .*geojson$ | xargs -d '\n' awk -F'[{:,]' '{print  $7 " " FILENAME " " $0}'

  11.0 ./manchester.geojson { "type": "Feature", "properties": { "id": 11.0, "borough": "Didsbury" }, "geometry": {"removed": 0} }}
  22.0 ./manchester.geojson { "type": "Feature", "properties": { "id": 22.0, "borough": "Chorlton" }, "geometry": {"removed": 0} }}

然后是最后一个问题，即将每个文件输出发送到具有相同名称但具有新扩展名的新文件。我可以通过简单的>重定向将许多文件的整个输出发送到一个大文件中，但这不是我所需要的。任何想法将不胜感激。

Answer 1

使用JSON解析器处理JSON数据。 jq是个好人。

const_cast

如果一切正常，请删除jqbody='{_id: .properties.id, filename: input_filename, document: .}' find . -type f -name \*geojson -print0 | while read -rd "" filename; do jq -c "$jqbody" "$filename" ## > ./tmpfile && mv ./tmpfile "$filename" done注释。

我看不到jq的等价的“就地编辑”选项，因此我需要使用shell while循环来获取文件名，而不是xargs。

输出：

##

我看到ID号已“整数化”。为避免这种情况，您的原始JSON应该引用id值，以便将其作为字符串逐字处理。

Answer 2

感谢@EdMorton和@glenjackman帮助我指出正确的方向。最后，我几乎要回答这个问题了。一旦清除了模糊的行尾，便可以执行以下单行操作：

<textarea></textarea>

缺少的部分是> find . -type f -name \*geojson | xargs -d '\n' awk -i inplace -F'[:,]' '{print "{ \"_id\":" $5 ", \"file\": \"" FILENAME "\", \"doc\": "$0"}"}'，用于原地修改文件，这是我最初没有考虑的选项。

查找通过管道传输到awk的重定向到新文件

2 个答案: