在Python中创建CSV文件时出现错误“ numpy.float64对象不可迭代”

时间:2018-10-25 10:53:43

标签: python pandas csv floating-point

我有一些csv格式的嘈杂(天文学)数据。它的形状为(815900,2),具有815k点,可提供特定时间磁盘的质量信息。当您近距离观察时,波动非常明显。例如,下面是数据片段,其中第一列是以秒为单位的时间,第二列是以kg为单位的质量:

40023700,2.40896E+028
40145700,2.44487E+028
40267700,2.44487E+028
40389700,2.44478E+028
40511600,1.535E+028
40633500,2.19067E+028
40755400,2.44496E+028
40877200,2.44489E+028
40999000,2.44489E+028
41120800,2.34767E+028
41242600,2.40936E+028

因此,似乎有一个1.53E + 028数据噪声点,可能还有2.19E + 028和2.35E + 028噪声点。

要解决此问题,我正在尝试设置一个Python脚本,该脚本将读取csv数据,然后对其进行一些限制,以便例如质量为<2.35E + 028,它将删除整个行,然后创建仅包含“良好”数据点的新csv文件:

40023700,2.40896E+028
40145700,2.44487E+028
40267700,2.44487E+028
40389700,2.44478E+028
40755400,2.44496E+028
40877200,2.44489E+028
40999000,2.44489E+028
41242600,2.40936E+028

在n8henrie的this old question最高答案之后,我到目前为止有:

import pandas as pd
import csv

# Here are the locations of my csv file of my original data and an EMPTY csv file that will contain my good, noiseless set of data

originaldata = '/Users/myname/anaconda2/originaldata.csv'
gooddata = '/Users/myname/anaconda2/gooddata.csv'

# I use pandas to read in the original data because then I can separate the columns of time as 'T' and mass as 'M'

originaldata = pd.read_csv('originaldata.csv',delimiter=',',header=None,names=['t','m'])

# Numerical values of the mass values

M = originaldata['m'].values

# Now to put a restriction in

for row in M:
    new_row = []
    for column in row:
        if column > 2.35E+028:
            new_row.append(column)

    csv.writer(open(newfile,'a')).writerow(new_row)

print('\n\n')
print('After:')
print(open(newfile).read())

但是,当我运行它时,出现此错误:

TypeError: 'numpy.float64' object is not iterable

我知道第一列(时间)是dtype int64,第二列(质量)是dtype float64 ...但是作为初学者,我仍然不太确定这个错误是什么意思或我要去哪里。任何帮助将不胜感激。预先非常感谢。

2 个答案:

答案 0 :(得分:1)

您可以通过布尔操作选择行。示例:

import pandas as pd
from io import StringIO

data = StringIO('''\
40023700,2.40896E+028
40145700,2.44487E+028
40267700,2.44487E+028
40389700,2.44478E+028
40511600,1.535E+028
40633500,2.19067E+028
40755400,2.44496E+028
40877200,2.44489E+028
40999000,2.44489E+028
41120800,2.34767E+028
41242600,2.40936E+028
''')

df = pd.read_csv(data,names=['t','m'])
good = df[df.m > 2.35e+28]
out = StringIO()
good.to_csv(out,index=False,header=False)
print(out.getvalue())

输出:

40023700,2.40896e+28
40145700,2.44487e+28
40267700,2.44487e+28
40389700,2.44478e+28
40755400,2.44496e+28
40877200,2.44489e+28
40999000,2.44489e+28
41242600,2.40936e+28

答案 1 :(得分:0)

这将返回一列:M = originaldata['m'].values

因此,当您执行for row in M:时,您在row中仅获得一个值,因此无法再次对其进行迭代。