大型CSV数据过滤

时间:2018-06-27 14:25:31

标签: python csv filter

我有一个大数据文件,并且我只对具有特定x值4.125的行感兴趣,如下所示。现在,因为4.125的值与离子的停止位置有关,所以我也很感兴趣相应的开始位置,因此我想将此信息保留在数组中。

如何编写一个程序,该程序可以有效地找到4.125的x停止位置并保留离子的起始位置?

这是一个120982 * 9数组,在下面显示的示例中,我希望保留离子#3849096的信息。

"Ion N","Mass","Charge","X","Y","Z","Azm","Elv","KE" 
3849094,0.00054858,-1,66.5216,-51,-3.8,-180,88.7,18160
3849094,0.00054858,-1,27.3925,30.3532,-4.07076,-177.1,41.5494,17697.2 
3849095,0.00054858,-1,66.5216,-51,-3.7,-180,88.7,18160
3849095,0.00054858,-1,26.6277,31.0039,-3.91402,-177.096,40.8293,17699.4
3849096,0.00054858,-1,66.5216,-51,-3.6,-180,88.7,18160
3849096,0.00054858,-1,4.125,44.9887,-2.47517,-176.363,25.715,17711.1

这是我到目前为止开发的代码,但是不起作用:

import pandas as pd 
import numpy as np

opts = pd.read_csv('Ambre_2.dat',sep = ',', low_memory = False)
df = pd.DataFrame(opts)

X = df.iloc[:,3]
IonN = df.iloc[:,0]
tol = 1e-6
Fltr = 4.125

filterreddata = df[abs(df['X'] - Fltr) < tol]
filteredions = df[np.in1d(df['Ion N'], filterreddata['Ion N'])]
filteredions[2:2:end, :] = []
f = open('ions.csv', 'w')
f.write(tabulate(filteredions))
f.close()

ions.csv文件应如下所示:

"Ion N","Mass","Charge","X","Y","Z","Azm","Elv","KE" 
348450  0.00054858  -1  50.2216 -41 0.9 0   88.1    9200
348451  0.00054858  -1  50.3216 -41 0.9 0   88.1    9200 
348511  0.00054858  -1  50.2216 -41 1   0   88.1    9200
348512  0.00054858  -1  50.3216 -41 1   0   88.1    9200

0 个答案:

没有答案