Question

我有多个文件，大约400 GB，我想将它们转换为json格式，以便放入elasticsearch进行分析。

每个文件大约200 MB。

原始文件看起来像：

IUGJHHGF@BERLIN:lhfrjy
0t7yfudf@WARSAW:qweokm246
0t7yfudf@CRACOW:Er747474
0t7yfudf@cracow:kui666666
000t7yf@Vienna:1йй2ц2й2цй2цц3у

它的角色不仅仅是英文。 key1总是用@分隔，其中city被分开;或者：

用代码解析后：

#!/usr/bin/env python

# coding: utf8
import json


with open('2') as f:
   for line in f:
      s1 = line.find("@")
      rest = line[s1+1:]
      if rest.find(";") != -1:
         if rest.find(":") != -1:
            print "FOUND BOTH : ; "
            s2 = -0
         else:
            s2 = s1+1+rest.find(";")
      elif rest.find(":") != -1:
         s2 = s1+1+rest.find(":")
      else:
         print "FOUND NO : ; "
         s2 = -0

      key1 = line[:s1]
      city = line[s1+1:s2]
      description = line[s2+1:len(line)-1]

所有文件如下：

RRS12345 Cracow Sunflowers
RRD12345 Berin Data

在解析之后我想要输出：

  {  
   "location_data":[  
      {  
         "key1":"RRS12345",
         "city":"Cracow",
         "description":"Sunflowers"
      },
      {  
         "key1":"RRD123dsd45",
         "city":"Berlin",
         "description":"Data"
      },
      {  
         "key1":"RRD123dsds45",
         "city":"Berlin",
         "description":"1йй2ц2й2цй2цц3у"
      }
   ]
}

如何在没有英文字符的情况下快速将其转换为所需的json格式？

Answer 1

import json


def process_text_to_json():
    location_data = []
    with open("file.txt") as f:
        for line in f:
            line = line.split()
            location_data.append({"key1": line[0], "city": line[1], "description": line[2]})

    location_data = {"location_data": location_data}
    return json.dumps(location_data)

输出样本：

{＆＃34; location_data＆＃34;：[{＆＃34; city＆＃34;：＆＃34; Cracow＆＃34;，＆＃34; key1＆＃34;：＆＃34; RRS12345＆＃34; ，＆＃34;描述＆＃34;：＆＃34;向日葵＆＃34;}，{＆＃34;城市＆＃34;：＆＃34; Berin＆＃34;，＆＃34; key1＆＃34;：＆＃ 34; RRD12345＆＃34;，＆＃34;说明＆＃34;：＆＃34;数据＆＃34;}，{＆＃34;城市＆＃34;：＆＃34; Cracow2＆＃34;，＆＃34; key1＆＃34;：＆＃34; RRS12346＆＃34;，＆＃34;描述＆＃34;：＆＃34;向日葵＆＃34;}，{＆＃34;城市＆＃34;：＆＃34; Berin2＆＃34; ，＆＃34; key1＆＃34;：＆＃34; RRD12346＆＃34;，＆＃34;说明＆＃34;：＆＃34;数据＆＃34;}，{＆＃34;城市＆＃34;：＆＃ 34; Cracow3＆＃34;，＆＃34; key1＆＃34;：＆＃34; RRS12346＆＃34;，＆＃34;说明＆＃34;：＆＃34;向日葵＆＃34;}，{＆＃34; city＆＃34;：＆＃34; Berin3＆＃34;，＆＃34; key1＆＃34;：＆＃34; RRD12346＆＃34;，＆＃34;说明＆＃34;：＆＃34;数据＆＃34;}] }

Answer 2

迭代每一行并形成你的词典。

<强>实施例

d = {"location_data":[]}
with open(filename, "r") as infile:
    for line in infile:
        val = line.split()
        d["location_data"].append({"key1": val[0], "city": val[1], "description": val[2]})

print(d)

在python中将文本文件转换为json

2 个答案: