Question

我正在读取一个文件，该文件的每一行都包含一个字典，但是整个文件本身都没有格式化为字典或列表。

{"key1":"value11", "key2":"value12"}
{"key1":"value21", "key2":"value22"}

我想做的是读取文件并更改某些键的值。像这样的东西。

with open(...
    data = reader.read().split("\n")
for dic in data:
    entry = json.loads(dic)
    entry["key"] = "another value"

我知道是否要将其保存到文件中，我必须再次打开该文件，但是它甚至没有在内存中更改，而且我猜想它与for i in data的工作方式有关，但是我不想复制每个文件，主要是因为我有很多行。

还有别的看不见的方式吗？

Answer 1

由于您提到文件可能很大，因此合理的方法是逐行读取输入文件，更改每一行，然后将修改后的行写入输出文件。

with open('/path/to/output.file') as outfile:
    with open('/path/to/input.file') as inputfile:
        for line in inputfile:
            entry = json.loads(line)
            entry['key'] = 'another value'
            outfile.write(json.dumps(entry) + '\n')

UPD ：关于您的实际问题：

with open(...) as reader:
    data = reader.read().split("\n")

for i in range(len(data)):
    entry = data[i]
    entry = json.loads(entry)
    entry["key"] = "another value"
    data[i] = json.dumps(entry)

此方法将更改内存中的data。在每个for循环迭代的代码片段中，您都在创建一个临时变量entry，然后对其进行更改，但由于在该循环的下一次迭代中您将其覆盖，因此仅丢弃结果。同样，使用for elem in collection方法通常不应修改集合。在您的情况下，集合是一个字符串列表，字符串在python中是不可变的。因此，只需切换到for i in range()方法，然后通过覆盖位置i上的整个元素来修改列表中的元素。

Answer 2

为了提高效率，我建议将您的数据转换为dict个对象的单个字典。您可以使用行号进行索引。

首先将数据读入一个字典：

from io import StringIO
import json, pickle

x = StringIO('''{"key1":"value11", "key2":"value12"}
{"key1":"value21", "key2":"value22"}''')

d = {}

# replace x with open('file.txt', 'r')
with x as fin:
    for idx, line in enumerate(fin):
        d[idx] = json.loads(line.strip())

print(d)

# {0: {'key1': 'value11', 'key2': 'value12'},
#  1: {'key1': 'value21', 'key2': 'value22'}}

然后将其写入Pickle文件，注意pickle.HIGHEST_PROTOCOL代表有效的二进制格式：

filename = r'C:\temp\out.pkl'
# never work with the old format again!
with open(filename, 'wb') as fout:
    pickle.dump(d, fout, pickle.HIGHEST_PROTOCOL)

然后再次读取文件：

with open(filename, 'rb') as fin:
    d_in = pickle.load(fin)

print(d_in)

# {0: {'key1': 'value11', 'key2': 'value12'},
#  1: {'key1': 'value21', 'key2': 'value22'}}

现在更改子命令就像d[1]['key1'] = 'newval'一样简单。

请注意，酸洗/序列化是特定于版本的。但是，通过这种一次性的数据重组，您应该会看到很大的性能改进。

Python修改字典列表中的值

2 个答案: