如何通过第i个字段的值切割numpy数组?

时间:2012-09-05 22:49:52

标签: arrays numpy split pandas

我有一个2D numpy数组,有4列和很多行(> 10000,这个数字不固定)。

我需要按其中一列的值创建 n 子数组;我找到的最接近的问题是How slice Numpy array by column value;尽管如此,我不知道该领域的确切值(它们是浮动的,它们会在我需要的每个文件中发生变化),但我知道它们不超过20个。

我想我可以逐行阅读,记录不同的值然后进行分割,但我认为有一种更有效的方法可以做到这一点。

谢谢。

2 个答案:

答案 0 :(得分:2)

您可以方便地使用多维切片:

import numpy as np

# just creating a random 2d array.
a = (np.random.random((10, 5)) * 100).astype(int)
print a
print

# select by the values of the 3rd column, selecting out more than 50.
b = a[a[:, 2] > 50]

# showing the rows for which the 3rd column value is > 50.
print b

另一个例子,更接近你在评论中提出的问题(?):

import numpy as np

# just creating a random 2d array.
a = np.random.random((10000, 5)) * 100
print a
print

# select by the values of the 3rd column, selecting out more than 50.
b = a[a[:, 2] > 50.0]
b = b[b[:, 2] <= 50.2]

# showing the rows for which the 3rd column value is > 50.
print b

这将选择第3列值为(50,50.2)的行。

答案 1 :(得分:1)

您可以将pandas用于该任务,更具体地说是DataFrame的groupby方法。这是一些示例代码:

import numpy as np
import pandas as pd

# generate a random 20x5 DataFrame
x=np.random.randint(0,10,100)
x.shape=(20,5)
df=pd.DataFrame(x)

# group by the values in the 1st column
g=df.groupby(0)

# make a dict with the numbers from the 1st column as keys and
# the slice of the DataFrame corresponding to each number as
# values of the dict
d={k:v for (k,v) in g}

一些示例输出:

In [74]: d[3]
Out[74]: 
    0  1  2  3  4
2   3  2  5  4  3
5   3  9  4  3  2
12  3  3  9  6  2
16  3  2  1  6  5
17  3  5  3  1  8