Question

我想创建一个包含大量数据文件的索引。索引应包含不同的值（String，Float和Int），作为列。每行应代表一个文件。为此，我可以使用一个列表。但是，现在，我想创建一个布尔掩码，其中包含一个列的标准

   Index = [[4., 8., 3., 5., 0., 10.],   #some value
            [1, 1, 1, 2, 4, 8],          #starting time
            ["file1", "file1", "file2", "file2", "file2", "file3"],   #location file
            [1, 2, 1, 2, 3, 1]           #ID in location file
           ]

所以我想举例说，索引[其中Index [0]＆lt; 5.和索引[1]＆gt;得到＆＃34;位置文件＆＃34; +＆＃34; ID＆＃34;。我知道我可以在numpy中进行这样的掩码操作，但我不能在np.arrays中使用混合数据类型。

这样做的有效方法是什么？

Answer 1

我不知道您的期望是什么，但这会输出您似乎所需的Index[2]和Index[3]中元素的元组。

>>> def mask(data, *criteria):
    toRet = []
    for i in range(len(data[0])):
        for index, predicate in criteria:
            if not predicate(data[index][i]):
                break
        else:
            toRet.append((data[2][i], data[3][i]))
    return toRet
>>> mask(Index, (0, lambda x: x <5), (1, lambda x: x > 3))
[('file2', 3)]

Answer 2

好吧，如果你坚持使用单个阵列，我会听从@ DSM的建议并使用大熊猫。例如：

In [1]: import pandas as pd
In [2]: %paste
Index = [[4., 8., 3., 5., 0., 10.],   #some value
            [1, 1, 1, 2, 4, 8],          #starting time
            ["file1", "file1", "file2", "file2", "file2", "file3"],   #location file
            [1, 2, 1, 2, 3, 1]           #ID in location file
           ]

## -- End pasted text --
In [3]: x = pd.DataFrame(zip(*Index), columns=['sv', 'time', 'file', 'id'])
In [4]: x
Out[4]: 
   sv  time   file  id
0   4     1  file1   1
1   8     1  file1   2
2   3     1  file2   1
3   5     2  file2   2
4   0     4  file2   3
5  10     8  file3   1

[6 rows x 4 columns]
In [5]: x.query('(sv < 5) & (time > 3)')
Out[5]: 
   sv  time   file  id
4   0     4  file2   3

[1 rows x 4 columns]

您也可以使用纯python而不是DataFrame.query()（这需要pandas＆gt; 13.0）：

In [6]: x[(x.sv < 5) & (x.time > 3)]
Out[6]: 
   sv  time   file  id
4   0     4  file2   3

[1 rows x 4 columns]

在Python中屏蔽混合数据类型矩阵

2 个答案: