按列名称选择文本文件中的特定列并提取其内容

时间:2014-09-02 21:21:53

标签: python python-3.x text numpy

我是Python的初学者,我发现很难为这个问题找到正确的解决方案。我浏览了stackoverflow中的所有类似帖子,但找不到解决方案 我有一个“.ext”文件。我需要跳过前两行。第三行包含表的列名 我需要搜索列omega(n,n)和Sigma(n,n)列名,其中n可以是任意数字(例如:sigma(1,1),omega(2,2))。分析列名为“sigma(n,n)”和“omega(n,n)”的列,并检查以“-1000000000”开头的行的这些列的值。如果值为<0.001,则输出“真正的”。

我的代码是:

import numpy as np
array=[]
array1=[]
b = np.genfromtxt(r'C:/nm73/proj/one.ext', delimiter=' ', names=True,dtype=None)[3:,:]
for n in range(len(b)-1):
    array=b['Sigma(n,n)']
    array1=b['omega(n,n)']

我不知道如何检查元素。

One.ext文件如下所示:如果文件格式不正确,我深表歉意。我是stackoverflow的新手。任何帮助都非常感谢。

TABLE NO.     1: First Order Conditional Estimation with Interaction: Goal     Function=MINIMUM VALUE OF OBJECTIVE FUNCTION: Problem=1 Subproblem=0 Superproblem1=0     Iteration1=0 Superproblem2=0 Iteration2=0
 ITERATION    THETA1       THETA2       SIGMA(1,1)   SIGMA(2,1)   SIGMA(2,2)   OMEGA(1,1)   OMEGA(2,1)   OMEGA(2,2)   OBJ
            0  2.50000E-01  1.00000E+01  1.00000E-01  0.00000E+00  1.00000E-01      1.00000E-01  0.00000E+00  1.00000E-01    9436.65314342255
            5  2.34948E-01  3.67675E+00  9.04159E-02  0.00000E+00  2.74933E+00  1.98686E-01  0.00000E+00  1.75724E-01    8745.97204613658
           10  2.11090E-01  4.30565E+00  1.34312E-01  0.00000E+00  1.12619E+00  1.32484E-01  0.00000E+00  1.36824E-02    8595.43106384756
           15  2.10696E-01  4.35495E+00  1.23897E-01  0.00000E+00  1.29124E+00  1.28600E-01  0.00000E+00  1.24441E-02    8591.51400321872
           20  2.11129E-01  4.36325E+00  1.24283E-01  0.00000E+00  1.28733E+00  1.28815E-01  0.00000E+00  1.24211E-02    8591.50022332770
  -1000000000  2.11129E-01  4.36325E+00  1.24283E-01  0.00000E+00  1.28733E+00  1.28815E-01  0.00000E+00  1.24211E-02    8591.50022332770
  -1000000001  8.07565E-03  6.97861E-02  5.28558E-03  1.00000E+10  4.20370E-01  1.78706E-02  1.00000E+10  3.15324E-03   0.000000000000000E+000
  -1000000004  0.00000E+00  0.00000E+00  3.52538E-01  0.00000E+00  1.13460E+00  3.58908E-01  0.00000E+00  1.11450E-01   0.000000000000000E+000
  -1000000005  0.00000E+00  0.00000E+00  7.49648E-03  1.00000E+10  1.85250E-01  2.48957E-02  1.00000E+10  1.41465E-02   0.000000000000000E+000

2 个答案:

答案 0 :(得分:1)

如果您未指定delimiter,则所有连续空格将被理解为一个分隔符。如果您指定delimiter=' ',则字面每个空间将充当分隔符。这会导致ValueError,因为genfromtxt会出现错误的列数。

所以如果你使用:

In [396]: b = np.genfromtxt(filename, names=True, dtype=None, skip_header=1)

然后你最终得到一个像这样的结构化数组:

In [397]: b
Out[397]: 
array([(0, 0.25, 10.0, 0.1, 0.0, 0.1, 0.1, 0.0, 0.1, 9436.65314342255),
       (5, 0.234948, 3.67675, 0.0904159, 0.0, 2.74933, 0.198686, 0.0, 0.175724, 8745.97204613658),
       (10, 0.21109, 4.30565, 0.134312, 0.0, 1.12619, 0.132484, 0.0, 0.0136824, 8595.43106384756),
       (15, 0.210696, 4.35495, 0.123897, 0.0, 1.29124, 0.1286, 0.0, 0.0124441, 8591.51400321872),
       (20, 0.211129, 4.36325, 0.124283, 0.0, 1.28733, 0.128815, 0.0, 0.0124211, 8591.5002233277),
       (-1000000000, 0.211129, 4.36325, 0.124283, 0.0, 1.28733, 0.128815, 0.0, 0.0124211, 8591.5002233277),
       (-1000000001, 0.00807565, 0.0697861, 0.00528558, 10000000000.0, 0.42037, 0.0178706, 10000000000.0, 0.00315324, 0.0),
       (-1000000004, 0.0, 0.0, 0.352538, 0.0, 1.1346, 0.358908, 0.0, 0.11145, 0.0),
       (-1000000005, 0.0, 0.0, 0.00749648, 10000000000.0, 0.18525, 0.0248957, 10000000000.0, 0.0141465, 0.0)], 
      dtype=[('ITERATION', '<i4'), ('THETA1', '<f8'), ('THETA2', '<f8'), ('SIGMA11', '<f8'), ('SIGMA21', '<f8'), ('SIGMA22', '<f8'), ('OMEGA11', '<f8'), ('OMEGA21', '<f8'), ('OMEGA22', '<f8'), ('OBJ', '<f8')])

注意最后的dtype。列名称不包含括号或逗号,因此SIGMA(1,1)代替SIGMA11In [398]: b['SIGMA11'] Out[398]: array([ 0.1 , 0.0904159 , 0.134312 , 0.123897 , 0.124283 , 0.124283 , 0.00528558, 0.352538 , 0.00749648]) 。您可以像这样访问此列:

{{1}}

答案 1 :(得分:1)

你尝试过熊猫吗? 这个例子可能显示了您正在寻找的基础:

import pandas as p
f = 'C:\Documents and Settings\Joaquin\Escritorio\one.ext'

# read your table and set the first column as index
table = p.read_csv(f, sep=' ', header=1,skipinitialspace=True )
table = table.set_index('ITERATION')

# get the two cells corresponding to the columns  you wan at row -100000000
print table.xs(-1000000000)[['SIGMA(1,1)', 'OMEGA(1,1)']]

给出:

SIGMA(1,1)    0.124283
OMEGA(1,1)    0.128815
Name: -1000000000, dtype: float64