例如以下矩阵,例如
matrix = [
['month','val1','val2','valn'],
['jan','100','200','300'],
['feb','101',201',302'],
['march','102','202','303'],
['april','103','203','303'],
['march','104','204','304']
]
我想根据列索引或名称(过滤器)列表创建一个新矩阵,所以
filter_col_indx = {0,2}
filter_col_name = {'month','val2'}
会产生相同的输出:
matrix2 = [
['month,'val2'],
['jan','200'],
['feb','201'],
['march','202'],
['april','203'],
['march','204']
]
对于大型矩阵,最有效的方法是什么? list_of_columns可以有所不同。
谢谢
答案 0 :(得分:3)
可以使用operator.itemgetter
:
import operator
matrix = [
['month','val1','val2','valn'],
['jan','100','200','300'],
['feb','101','201','302'],
['march','102','202','303'],
['april','103','203','303'],
['march','104','204','304']
]
filter_col_indx = [0,2]
getter = operator.itemgetter(*filter_col_indx)
matrix2 = [list(getter(row)) for row in matrix]
print(matrix2)
产量
[['month', 'val2'],
['jan', '200'],
['feb', '201'],
['march', '202'],
['april', '203'],
['march', '204']]
operator.itemgetter(*filter_col_indx)
返回一个函数,该函数以序列作为参数,并返回序列中的第0和第2项。因此,您可以将此函数应用于每一行,以从matrix
中选择所需的值。
如果您安装了pandas,那么您可以matrix
成为一个DataFrame并选择所需的列,如下所示:
import pandas as pd
matrix = [
['month','val1','val2','valn'],
['jan','100','200','300'],
['feb','101','201','302'],
['march','102','202','303'],
['april','103','203','303'],
['march','104','204','304']
]
df = pd.DataFrame(matrix[1:], columns=matrix[0])
print(df[['month', 'val2']])
产量
month val2
0 jan 200
1 feb 201
2 march 202
3 april 203
4 march 204
您可能喜欢使用大熊猫,因为它可以很容易地进行大量数据操作。
答案 1 :(得分:1)
如果您对整列感兴趣,我认为使用包含列作为列表的字典来存储数据是合适的:
data = {'month': ['jan', 'feb', 'march', 'april', 'march'],
'val1': [100, 101, 102, 103, 104],
'val2': [200, 201, 202, 203, 204],
...
}
要检索列(我现在已经水平编写了......),您可以:
{key: data[key] for key in ['month', 'val2']}
答案 2 :(得分:1)
这是一个笨拙的版本:
import numpy as np
matrix = np.array([
['month','val1','val2','valn'],
['jan','100','200','300'],
['feb','101','201','302'],
['march','102','202','303'],
['april','103','203','303'],
['march','104','204','304']
])
search = ['month', 'val2']
indexes = matrix[0,:].searchsorted(search) #search only the first row
# or indexes = [0, 2]
print matrix[:,indexes]
>>> [['month' 'val2']
['jan' '200']
['feb' '201']
['march' '202']
['april' '203']
['march' '204']]