Question

这是我在python中的第一天我有一个像下面这样的csv文件。

链接到文件：https://1drv.ms/u/s!AlQo_tHSk1tGjlZYua8xoHSRQ4m6。

文件名：toy.csv

id  text
1   hello world
2   hello foo world
3   hello my world

我必须编写一个代码，使其采用以下格式：

所需格式：

'{"documents":[{"id":"1","text":"hello world"},{"id":"2","text":"hello foo world"},{"id":"three","text":"hello my world"},]}'
num_detect_langs = 1;

直接硬编码的一种方法如下：

input_texts = '{"documents":[{"id":"1","text":"hello world"},{"id":"2","text":"hello foo world"},{"id":"three","text":"hello my world"},]}'

此处输入文本的类型为＆＃34; str＆＃34;

但实际上这可能无法实现，因为我的输入文件可以包含1000条记录。我知道我们需要形成一个＆＃34; for＆＃34;循环类的东西，以便它采用所需的格式。我不知道如何实现这一点。

有人可以帮忙吗？

Answer 1

这不完全是你想要的但是让你非常接近：

import io
import json

# this is only to fake your input file...
file = io.StringIO('''id  text
1   hello world
2   hello foo world
3   hello my world
''')

# you would have to open your file:
# with open('filename', 'r') as file:
#     ...

lst = []
header = next(file)  # read and discard the header (id  text)
for line in file:
    splt = line[:-1].split(None, 1)
    lst.append({'id': splt[0], 'text': splt[1]})

print(json.dumps(lst))

# [{"id": "1", "text": "hello world"}, 
#  {"id": "2", "text": "hello foo world"},
#  {"id": "3", "text": "hello my world"}]

我相信你会把剩下的事情搞清楚。

这仅使用内置函数。但是看到你提到了数据框架＆＃39;我想你想用熊猫...

Answer 2

要将您在问题中提到的df数据框对象转换为所需的格式，您可以执行以下操作：

d={}
d["Documents"] = df.to_dict(orient='records')    
print d

输出：

{'documents': [{'text': 'hello world', 'id': 1}, {'text': 'hello foo world', 'id': 2}, {'text': 'hello my world', 'id': 3}]}

Answer 3

假设一个名为data.txt的输入文件：

id  text
1   hello world
2   hello foo world
3   hello my world

执行此操作以创建所需的JSON字符串：

import json

with open('data.txt','r') as f:
    lines = f.read().splitlines()

first_line = lines[0]

id_header, text_header = first_line.split()
text_index = first_line.index(text_header)

documents = []

for line in lines[1:]:
    index = line.split()[0]
    text = line[text_index:]

    documents.append({
        id_header: index,
        text_header: text,
    })

result = {"documents": documents}

json_string = json.dumps(result)
print json_string

Answer 4

假设您的数据位于某些文件中，例如＆＃34; data.csv＆＃34;在你的工作目录中。我也假设它是一个逗号分隔列表（你只发布了一张非常无用的图片）。无论如何：

import csv
import json
with open('data.csv') as f:
    reader = csv.DictReader(f)
    input_text = {'documents': list(reader)}
input_text = json.dumps(input_text)

Python ::将数据从csv转换为＆＃34; str＆＃34;类型的数据

4 个答案: