Question

我有一个文件，其中列出了描述特定参数的列：

尺寸亮度

我只需要此文件中的特定数据（特别是行和列）。到目前为止，我在python中有一个代码，我在其中添加了必要的行号。我只需要知道如何匹配它以获得文本文件中的正确字符串以及列（幅度）和（亮度）中的变量。有关如何处理此问题的任何建议吗？

以下是我的代码示例（#comments描述了我所做的和我想做的事情）：

temp_ListMatch = (point[5]).strip() 
if temp_ListMatch:
    ListMatchaddress = (point[5]).strip()
    ListMatchaddress = re.sub(r'\s', '_', ListMatchaddress) 
    ListMatch_dirname = '/projects/XRB_Web/apmanuel/499/Lists/' + ListMatchaddress
    #print ListMatch_dirname+"\n" 

    try:
        file5 = open(ListMatch_dirname, 'r')
    except IOError:
        print 'Cannot open: '+ListMatch_dirname

    Optparline = []
    for line in file5:
        point5 = line.split()
        j = int(point5[1])
        Optparline.append(j)
        #Basically file5 contains the line numbers I need, 
        #and I have appended these numbers to the variable j. 
        temp_others = (point[4]).strip()
        if temp_others: 
            othersaddress = (point[4]).strip()
            othersaddress =re.sub(r'\s', '_', othersaddress) 
            othersbase_dirname = '/projects/XRB_Web/apmanuel/499/Lists/' + othersaddress
            try:
                file6 = open(othersbase_dirname, 'r')
            except IOError:
                print 'Cannot open: '+othersbase_dirname

            gmag = []
            z = []
            rh = []
            gz = []

            for line in file6:
                point6 = line.split()
                f = float(point6[2])
                g = float(point6[4])
                h = float(point6[6])
                i = float(point6[9])
         # So now I have opened file 6 where this list of data is, and have
        # identified the columns of elements that I need. 
        # I only need the particular rows (provided by line number) 
        # with these elements chosen. That is where I'm stuck!

Answer 1

将整个数据文件加载到pandas DataFrame中（假设数据文件有一个标题，我们可以从中获取列名）

import pandas as pd
df = pd.read_csv('/path/to/file')

将行号文件加载到pandas系列中（假设每行有一个）：

#  squeeze = True makes the function return a series
row_numbers = pd.read_csv('/path/to/rows_file', squeeze = True)

仅返回行号文件中的行，以及列的大小和亮度（这假设第一行的编号为0）：

relevant_rows = df.ix[row_numbers][['magnitude', 'luminosity']

将行号与表中的字符串匹配。

1 个答案: