熊猫to_dict修改数字

时间:2016-04-18 13:26:17

标签: python csv pandas

我一直在玩一个能够吸收CSV数据的功能,并使用pandas to_dict函数作为实现将数据转换为JSON的最终目标之一。问题是它正在修改数字(例如1.6变为1.6000000000000001)。我并不担心准确性的损失,但是因为用户会看到数字的变化,所以看起来......业余。

我知道这是here之前出现的事情,但它是2年前,并没有真正得到很好的回答,我还有一个额外的复杂功能 - 我期待的数据框架转换为字典可以是数据类型的任意组合。因此,先前解决方案的问题是:

  1. 只有在您不需要使用数字时才能将所有数字转换为对象 - 我希望选项能够计算重新引入加法小数问题的总和和平均值
  2. 根据用户提供的数据强制将数字舍入为x小数将降低准确性或添加额外的不必要的0
  3. 所以,从高层来看,我的问题是:

    有没有更好的方法来确保数字没有被修改,但是保存在数字数据类型中?这是一个改变我首先导入CSV数据的方式的问题吗?当然有一个我忽略的简单解决方案?

    这是一个简单的脚本,可以重现这个错误:

    import pandas as pd
    
    import sys
    if sys.version_info[0] < 3:
        from StringIO import StringIO
    else:
        from io import StringIO
    
    CSV_Data = "Index,Column_1,Column_2,Column_3,Column_4,Column_5,Column_6,Column_7,Column_8\nindex_1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8\nindex_2,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8\nindex_3,3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8\nindex_4,4.1,4.2,4.3,4.4,4.5,4.6,4.7,4.8"
    
    input_data = StringIO(CSV_Data)
    df = pd.DataFrame.from_csv(path = input_data, header = 0, sep=',', index_col=0, encoding='utf-8')
    print(df.to_dict(orient = 'records'))
    

2 个答案:

答案 0 :(得分:2)

您可以使用pd.io.json.dumps来处理带有pandas对象的嵌套dicts。

让我们创建一个包含数据框记录和自定义指标的summary字典。

In [137]: summary = {'df': df.to_dict(orient = 'records'), 'df_metric': df.sum() / df.min()}

In [138]: summary['df_metric']
Out[138]:
Column_1    9.454545
Column_2    9.000000
Column_3    8.615385
Column_4    8.285714
Column_5    8.000000
Column_6    7.750000
Column_7    7.529412
Column_8    7.333333
dtype: float64

In [139]: pd.io.json.dumps(summary)
Out[139]: '{"df":[{"Column_7":1.7,"Column_6":1.6,"Column_5":1.5,"Column_4":1.4,"Column_3":1.3,"Column_2":1.2,"Column_1":1.1,"Column_8":1.8},{"Column_7":2.7,"Column_6":2.6,"Column_5":2.5,"Column_4":2.4,"Column_3":2.3,"Column_2":2.2,"Column_1":2.1,"Column_8":2.8},{"Column_7":3.7,"Column_6":3.6,"Column_5":3.5,"Column_4":3.4,"Column_3":3.3,"Column_2":3.2,"Column_1":3.1,"Column_8":3.8},{"Column_7":4.7,"Column_6":4.6,"Column_5":4.5,"Column_4":4.4,"Column_3":4.3,"Column_2":4.2,"Column_1":4.1,"Column_8":4.8}],"df_metric":{"Column_1":9.4545454545,"Column_2":9.0,"Column_3":8.6153846154,"Column_4":8.2857142857,"Column_5":8.0,"Column_6":7.75,"Column_7":7.5294117647,"Column_8":7.3333333333}}'

使用double_precision来改变双打的最大数字精度。 注意。 df_metric值。

In [140]: pd.io.json.dumps(summary, double_precision=2)
Out[140]: '{"df":[{"Column_7":1.7,"Column_6":1.6,"Column_5":1.5,"Column_4":1.4,"Column_3":1.3,"Column_2":1.2,"Column_1":1.1,"Column_8":1.8},{"Column_7":2.7,"Column_6":2.6,"Column_5":2.5,"Column_4":2.4,"Column_3":2.3,"Column_2":2.2,"Column_1":2.1,"Column_8":2.8},{"Column_7":3.7,"Column_6":3.6,"Column_5":3.5,"Column_4":3.4,"Column_3":3.3,"Column_2":3.2,"Column_1":3.1,"Column_8":3.8},{"Column_7":4.7,"Column_6":4.6,"Column_5":4.5,"Column_4":4.4,"Column_3":4.3,"Column_2":4.2,"Column_1":4.1,"Column_8":4.8}],"df_metric":{"Column_1":9.45,"Column_2":9.0,"Column_3":8.62,"Column_4":8.29,"Column_5":8.0,"Column_6":7.75,"Column_7":7.53,"Column_8":7.33}}'

您可以使用orient='records/index/..'来处理数据框 - &gt; to_json建设。

In [144]: pd.io.json.dumps(summary, orient='records')
Out[144]: '{"df":[{"Column_7":1.7,"Column_6":1.6,"Column_5":1.5,"Column_4":1.4,"Column_3":1.3,"Column_2":1.2,"Column_1":1.1,"Column_8":1.8},{"Column_7":2.7,"Column_6":2.6,"Column_5":2.5,"Column_4":2.4,"Column_3":2.3,"Column_2":2.2,"Column_1":2.1,"Column_8":2.8},{"Column_7":3.7,"Column_6":3.6,"Column_5":3.5,"Column_4":3.4,"Column_3":3.3,"Column_2":3.2,"Column_1":3.1,"Column_8":3.8},{"Column_7":4.7,"Column_6":4.6,"Column_5":4.5,"Column_4":4.4,"Column_3":4.3,"Column_2":4.2,"Column_1":4.1,"Column_8":4.8}],"df_metric":[9.4545454545,9.0,8.6153846154,8.2857142857,8.0,7.75,7.5294117647,7.3333333333]}'

本质上,pd.io.json.dumps - 将任意对象递归转换为JSON,内部使用ultrajson

答案 1 :(得分:0)

我需要使用正确的浮点数来制作df.to_dict('list')。但是df.to_json()目前还不支持orient='list'。因此,我执行以下操作:

 list_oriented_dict = {
    column: list(data.values())
    for column, data in json.loads(df.to_json()).items()
}

这不是最好的方法,但是对我有用。也许有人有一个更优雅的解决方案?