python - 通过过滤矩阵/ bidimensional列表中的列来创建新的子矩阵

时间:2015-03-08 23:38:43

标签: python matrix filtering

例如以下矩阵,例如

matrix = [
    ['month','val1','val2','valn'],
    ['jan','100','200','300'],
    ['feb','101',201',302'],
    ['march','102','202','303'],
    ['april','103','203','303'],
    ['march','104','204','304']
]

我想根据列索引或名称(过滤器)列表创建一个新矩阵,所以

filter_col_indx = {0,2}
filter_col_name = {'month','val2'}

会产生相同的输出:

matrix2 = [
    ['month,'val2'],
    ['jan','200'],
    ['feb','201'],
    ['march','202'],
    ['april','203'],
    ['march','204']
]

对于大型矩阵,最有效的方法是什么? list_of_columns可以有所不同。

谢谢

3 个答案:

答案 0 :(得分:3)

可以使用operator.itemgetter

完成此操作
import operator
matrix = [
    ['month','val1','val2','valn'],
    ['jan','100','200','300'],
    ['feb','101','201','302'],
    ['march','102','202','303'],
    ['april','103','203','303'],
    ['march','104','204','304']
]

filter_col_indx = [0,2]
getter = operator.itemgetter(*filter_col_indx)
matrix2 = [list(getter(row)) for row in matrix]
print(matrix2)

产量

[['month', 'val2'],
 ['jan', '200'],
 ['feb', '201'],
 ['march', '202'],
 ['april', '203'],
 ['march', '204']]

operator.itemgetter(*filter_col_indx)返回一个函数,该函数以序列作为参数,并返回序列中的第0和第2项。因此,您可以将此函数应用于每一行,以从matrix中选择所需的值。


如果您安装了pandas,那么您可以matrix成为一个DataFrame并选择所需的列,如下所示:

import pandas as pd

matrix = [
    ['month','val1','val2','valn'],
    ['jan','100','200','300'],
    ['feb','101','201','302'],
    ['march','102','202','303'],
    ['april','103','203','303'],
    ['march','104','204','304']
]
df = pd.DataFrame(matrix[1:], columns=matrix[0])
print(df[['month', 'val2']])

产量

   month val2
0    jan  200
1    feb  201
2  march  202
3  april  203
4  march  204

您可能喜欢使用大熊猫,因为它可以很容易地进行大量数据操作。

答案 1 :(得分:1)

如果您对整列感兴趣,我认为使用包含列作为列表的字典来存储数据是合适的:

data = {'month': ['jan', 'feb', 'march', 'april', 'march'],
        'val1': [100, 101, 102, 103, 104],
        'val2': [200, 201, 202, 203, 204],
        ...
       }

要检索列(我现在已经水平编写了......),您可以:

{key: data[key] for key in ['month', 'val2']}

答案 2 :(得分:1)

这是一个笨拙的版本:

import numpy as np

matrix = np.array([
    ['month','val1','val2','valn'],
    ['jan','100','200','300'],
    ['feb','101','201','302'],
    ['march','102','202','303'],
    ['april','103','203','303'],
    ['march','104','204','304']
])

search = ['month', 'val2']

indexes = matrix[0,:].searchsorted(search) #search only the first row
# or indexes = [0, 2]
print matrix[:,indexes] 
>>> [['month' 'val2']
     ['jan' '200']
     ['feb' '201']
     ['march' '202']
     ['april' '203']
     ['march' '204']]