将行号与表中的字符串匹配。

时间:2013-03-18 03:24:18

标签: python

我有一个文件,其中列出了描述特定参数的列:

尺寸亮度

我只需要此文件中的特定数据(特别是行和列)。到目前为止,我在python中有一个代码,我在其中添加了必要的行号。我只需要知道如何匹配它以获得文本文件中的正确字符串以及列(幅度)和(亮度)中的变量。有关如何处理此问题的任何建议吗?

以下是我的代码示例(#comments描述了我所做的和我想做的事情):

temp_ListMatch = (point[5]).strip() 
if temp_ListMatch:
    ListMatchaddress = (point[5]).strip()
    ListMatchaddress = re.sub(r'\s', '_', ListMatchaddress) 
    ListMatch_dirname = '/projects/XRB_Web/apmanuel/499/Lists/' + ListMatchaddress
    #print ListMatch_dirname+"\n" 

    try:
        file5 = open(ListMatch_dirname, 'r')
    except IOError:
        print 'Cannot open: '+ListMatch_dirname

    Optparline = []
    for line in file5:
        point5 = line.split()
        j = int(point5[1])
        Optparline.append(j)
        #Basically file5 contains the line numbers I need, 
        #and I have appended these numbers to the variable j. 
        temp_others = (point[4]).strip()
        if temp_others: 
            othersaddress = (point[4]).strip()
            othersaddress =re.sub(r'\s', '_', othersaddress) 
            othersbase_dirname = '/projects/XRB_Web/apmanuel/499/Lists/' + othersaddress
            try:
                file6 = open(othersbase_dirname, 'r')
            except IOError:
                print 'Cannot open: '+othersbase_dirname

            gmag = []
            z = []
            rh = []
            gz = []

            for line in file6:
                point6 = line.split()
                f = float(point6[2])
                g = float(point6[4])
                h = float(point6[6])
                i = float(point6[9])
         # So now I have opened file 6 where this list of data is, and have
        # identified the columns of elements that I need. 
        # I only need the particular rows (provided by line number) 
        # with these elements chosen. That is where I'm stuck!

1 个答案:

答案 0 :(得分:0)

将整个数据文件加载到pandas DataFrame中(假设数据文件有一个标题,我们可以从中获取列名)

import pandas as pd
df = pd.read_csv('/path/to/file') 

将行号文件加载到pandas系列中(假设每行有一个):

#  squeeze = True makes the function return a series
row_numbers = pd.read_csv('/path/to/rows_file', squeeze = True)

仅返回行号文件中的行,以及列的大小和亮度(这假设第一行的编号为0):

relevant_rows = df.ix[row_numbers][['magnitude', 'luminosity']