Question

有人可以解释以下三个数字在熊猫中的行为吗？我正在尝试加载以下值并正确表示它们。

954081199.495100000000000 => 9954081199.4951
9449546861.291050000000000 => 9449546861.29105
9752031802.626950000000000 => 9752031802.62695

随着尾随的0被删除，Pandas理解的值也会更改。似乎尾随数字正在影响值的重要性。由于不知道小数点后的非零值的实际长度，仅将第n个数字后的值截断是不可行的。

熊猫中是否有某些东西可以控制这种行为？我曾尝试使用“ c”引擎，但输出相同。

正在从文本文件读取数据。

谢谢。

Loading sample_1.txt
Row   : Raw Value                      : Pandas Value
    0 : 954081199.495100000000000      : 954081199.4950998
    1 : 954081199.49510000000000       : 954081199.4950998
    2 : 954081199.4951000000000        : 954081199.4950998
    3 : 954081199.495100000000         : 954081199.4951
    4 : 954081199.49510000000          : 954081199.4951
    5 : 954081199.4951000000           : 954081199.4951
    6 : 954081199.495100000            : 954081199.4951
    7 : 954081199.49510000             : 954081199.4951
    8 : 954081199.4951000              : 954081199.4951
    9 : 954081199.495100               : 954081199.4951
   10 : 954081199.49510                : 954081199.4951
   11 : 954081199.4951                 : 954081199.4951
   12 : 9449546861.291050000000000     : 9449546861.291044
   13 : 9449546861.29105000000000      : 9449546861.291044
   14 : 9449546861.2910500000000       : 9449546861.291046
   15 : 9449546861.291050000000        : 9449546861.291046
   16 : 9449546861.29105000000         : 9449546861.291048
   17 : 9449546861.2910500000          : 9449546861.291048
   18 : 9449546861.291050000           : 9449546861.291048
   19 : 9449546861.29105000            : 9449546861.291048
   20 : 9449546861.2910500             : 9449546861.29105
   21 : 9449546861.291050              : 9449546861.29105
   22 : 9449546861.29105               : 9449546861.29105
   23 : 9752031802.626950000000000     : 9752031802.626955
   24 : 9752031802.62695000000000      : 9752031802.626955
   25 : 9752031802.6269500000000       : 9752031802.626951
   26 : 9752031802.626950000000        : 9752031802.626951
   27 : 9752031802.62695000000         : 9752031802.626951
   28 : 9752031802.6269500000          : 9752031802.626951
   29 : 9752031802.626950000           : 9752031802.626951
   30 : 9752031802.62695000            : 9752031802.626951
   31 : 9752031802.6269500             : 9752031802.62695
   32 : 9752031802.626950              : 9752031802.62695
   33 : 9752031802.62695               : 9752031802.62695
Done

产生上述输出的代码

#!/usr/bin/env python3

import pandas

def main():
    file_name = 'sample_1.txt'
    print ('Loading ' + file_name)    
    content_df = pandas.read_csv(file_name, delimiter='|', header=None, engine='python', skipinitialspace=True,skiprows=0,skipfooter=0)
    num_rows = content_df.values.shape[0]

    with open(file_name, 'r') as f:
        lines_list = f.read().split('\n')

    f.close()
    rowcount = 0
    print('Row   : Raw Value' + ' '*22 + ': Pandas Value')
    while rowcount < num_rows:
        value_list = lines_list[rowcount].split('|')
        print('{0:5d} : {1} : {2}'.format(rowcount, value_list[2].ljust(30, ' '), content_df.iloc[rowcount, 2]))

        # print('row: ' + str(content_df.iloc[rowcount, 1]) + ': ' + str(content_df.iloc[rowcount, 2]) + ': ' + str(value_list[2]))
        rowcount = rowcount +1

    print ('Done')


if __name__ == '__main__':
    main()

Answer 1

这可以使用'c'引擎进行配置，但是float_precision选项设置为'high'：float_precision ='high'。

非常感谢。

参考：Git Hub Issue

Loading sample_1.txt
Row   : Raw Value                      : Pandas Value
    0 : 954081199.495100000000000      : 954081199.4951
    1 : 954081199.49510000000000       : 954081199.4951
    2 : 954081199.4951000000000        : 954081199.4951
    3 : 954081199.495100000000         : 954081199.4951
    4 : 954081199.49510000000          : 954081199.4951
    5 : 954081199.4951000000           : 954081199.4951
    6 : 954081199.495100000            : 954081199.4951
    7 : 954081199.49510000             : 954081199.4951
    8 : 954081199.4951000              : 954081199.4951
    9 : 954081199.495100               : 954081199.4951
   10 : 954081199.49510                : 954081199.4951
   11 : 954081199.4951                 : 954081199.4951
   12 : 9449546861.291050000000000     : 9449546861.29105
   13 : 9449546861.29105000000000      : 9449546861.29105
   14 : 9449546861.2910500000000       : 9449546861.29105
   15 : 9449546861.291050000000        : 9449546861.29105
   16 : 9449546861.29105000000         : 9449546861.29105
   17 : 9449546861.2910500000          : 9449546861.29105
   18 : 9449546861.291050000           : 9449546861.29105
   19 : 9449546861.29105000            : 9449546861.29105
   20 : 9449546861.2910500             : 9449546861.29105
   21 : 9449546861.291050              : 9449546861.29105
   22 : 9449546861.29105               : 9449546861.29105
   23 : 9752031802.626950000000000     : 9752031802.626951
   24 : 9752031802.62695000000000      : 9752031802.626951
   25 : 9752031802.6269500000000       : 9752031802.626951
   26 : 9752031802.626950000000        : 9752031802.626951
   27 : 9752031802.62695000000         : 9752031802.626951
   28 : 9752031802.6269500000          : 9752031802.626951
   29 : 9752031802.626950000           : 9752031802.626951
   30 : 9752031802.62695000            : 9752031802.626951
   31 : 9752031802.6269500             : 9752031802.626951
   32 : 9752031802.626950              : 9752031802.62695
   33 : 9752031802.62695               : 9752031802.62695
Done

修改后的代码：

#!/usr/bin/env python3

import pandas

def main():
    pandas.set_option('precision', 10)
    file_name = 'sample_1.txt'
    print ('Loading ' + file_name)    
    content_df = pandas.read_csv(file_name, delimiter='|', header=None, engine='c', skipinitialspace=True,skiprows=0,
                                 float_precision='high')
    num_rows = content_df.values.shape[0]

    with open(file_name, 'r') as f:
        lines_list = f.read().split('\n')

    f.close()
    rowcount = 0
    print('Row   : Raw Value' + ' '*22 + ': Pandas Value')
    while rowcount < num_rows:
        value_list = lines_list[rowcount].split('|')
        print('{0:5d} : {1} : {2}'.format(rowcount, value_list[2].ljust(30, ' '), content_df.iloc[rowcount, 2]))

        # print('row: ' + str(content_df.iloc[rowcount, 1]) + ': ' + str(content_df.iloc[rowcount, 2]) + ': ' + str(value_list[2]))
        rowcount = rowcount +1

    print ('Done')


if __name__ == '__main__':
    main()

最低有效“ 0”数字的位数会影响熊猫的价值

1 个答案: