Question

在python中保存和加载数据的最简单方法是什么，最好是以人类可读的输出格式？

我正在保存/加载的数据包含两个浮点数向量。理想情况下，这些矢量将在文件中命名（例如X和Y）。

我当前的save()和load()函数使用file.readline()，file.write()和字符串到浮点转换。必须有更好的东西。

Answer 1

获取人类可读输出的最简单方法是使用序列化格式，如JSON。 Python包含一个json库，可用于将数据串行化为字符串。与pickle类似，您可以将其与IO对象一起使用，将其写入文件。

import json

file = open('/usr/data/application/json-dump.json', 'w+')
data = { "x": 12153535.232321, "y": 35234531.232322 }

json.dump(data, file)

如果您想获得一个简单的字符串而不是将其转储到文件中，您可以使用 json。 dumps（） 代替：

import json
print json.dumps({ "x": 12153535.232321, "y": 35234531.232322 })

从文件中回读也很简单：

import json

file = open('/usr/data/application/json-dump.json', 'r')
print json.load(file)

json库功能齐全，所以我建议checking out the documentation看看你能用它做些什么。

Answer 2

有几种选择 - 我不知道你喜欢什么。如果两个向量具有相同的长度，您可以使用numpy.savetxt() 将向量保存为x和y作为列：

 # saving:
 f = open("data", "w")
 f.write("# x y\n")        # column names
 numpy.savetxt(f, numpy.array([x, y]).T)
 # loading:
 x, y = numpy.loadtxt("data", unpack=True)

如果你正在处理更大的浮动向量，你应该使用NumPy。

Answer 3

如果它应该是人类可读的，我会也适合JSON。除非你需要用企业型交换它人们，他们更喜欢XML。： - ）
如果它应该是人可编辑的和不是太复杂，我可能会去有某种类似INI的格式，比如configparser。
如果它很复杂，则不需要交换，我会和你一起去腌制数据，除非它非常复杂，在这种情况下我会使用ZODB。
如果是大量数据，则需要交换，我会使用SQL。

我认为这几乎涵盖了它。

Answer 4

一个简单的序列化格式，人类和计算机都可以轻松读取JSON。

您可以使用json Python模块。

Answer 5

由于我们正在谈论人类编辑文件，我认为我们谈论的数据相对较少。

以下骨架实现如何？它只是将数据保存为key=value对，并与列表，元组和许多其他内容一起使用。

    def save(fname, **kwargs):
      f = open(fname, "wt")
      for k, v in kwargs.items():
        print >>f, "%s=%s" % (k, repr(v))
      f.close()

    def load(fname):
      ret = {}
      for line in open(fname, "rt"):
        k, v = line.strip().split("=", 1)
        ret[k] = eval(v)
      return ret

    x = [1, 2, 3]
    y = [2.0, 1e15, -10.3]
    save("data.txt", x=x, y=y)
    d = load("data.txt")
    print d["x"]
    print d["y"]

Answer 6

正如我在接受的答案中所评论的那样，使用numpy这可以通过一个简单的单行程来完成：

假设您numpy导入为np（这是常见做法），

np.savetxt('xy.txt', np.array([x, y]).T, fmt="%.3f", header="x   y")

将以（可选）格式和

保存数据

x, y = np.loadtxt('xy.txt', unpack=True)

将加载它。

文件xy.txt将如下所示：

# x   y
1.000 1.000
1.500 2.250
2.000 4.000
2.500 6.250
3.000 9.000

请注意，格式字符串fmt=...是可选的，但如果目标是人类可读性，则可能非常有用。如果使用，则使用通常的printf类代码指定（在我的示例中：带有3位小数的浮点数）。

Answer 7

在您可能想为Body类编写之前，这里是Encoder的示例：

# add this to your code
class BodyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.ndarray):
            return obj.tolist()
        if hasattr(obj, '__jsonencode__'):
            return obj.__jsonencode__()
        if isinstance(obj, set):
            return list(obj)
        return obj.__dict__

    # Here you construct your way to dump your data for each instance
    # you need to customize this function
    def deserialize(data):
        bodies = [Body(d["name"],d["mass"],np.array(d["p"]),np.array(d["v"])) for d in data["bodies"]]
        axis_range = data["axis_range"]
        timescale = data["timescale"]
        return bodies, axis_range, timescale

    # Here you construct your way to load your data for each instance
    # you need to customize this function
    def serialize(data):
        file = open(FILE_NAME, 'w+')
        json.dump(data, file, cls=BodyEncoder, indent=4)
        print("Dumping Parameters of the Latest Run")
        print(json.dumps(data, cls=BodyEncoder, indent=4))

这是我要序列化的类的示例：

class Body(object):
    # you do not need to change your class structure
    def __init__(self, name, mass, p, v=(0.0, 0.0, 0.0)):
        # init variables like normal
        self.name = name
        self.mass = mass
        self.p = p
        self.v = v
        self.f = np.array([0.0, 0.0, 0.0])

    def attraction(self, other):
        # not important functions that I wrote...

以下是序列化的方法：

# you need to customize this function
def serialize_everything():
    bodies, axis_range, timescale = generate_data_to_serialize()

    data = {"bodies": bodies, "axis_range": axis_range, "timescale": timescale}
    BodyEncoder.serialize(data)

以下是转储方法：

def dump_everything():
    data = json.loads(open(FILE_NAME, "r").read())
    return BodyEncoder.deserialize(data)

在python中轻松保存/加载数据

7 个答案: