I have a NumPy 2D array as shown below:
data.dat
X1 X2 X3 X4
1 1 1 1
2 2 4 2
3 3 9 3
4 4 16 4
5 5 25 5
6 6 36 6
7 nan 49 7
8 nan 64 8
9 nan 81 nan
10 nan nan nan
Now how do I output the last element of each column ignoring nan in the array. I tried without success the code:
A[~np.isnan(A)][-1]
Code used
import numpy as np
with open('data.dat', "r") as data:
while True:
line = data.readline()
if not line.startswith('#'):
break
data_header = [i for i in line.strip().split('\t') if i]
A = np.genfromtxt('data.dat', names = data_header, dtype = float, delimiter = '\t')
答案 0 :(得分:1)
If A were a plain NumPy array of dtype 'float' (instead of a structured array) then you could use
import numpy as np
nan = np.nan
A = np.array([[ 1., 1., 1., 1.],
[ 2., 2., 4., 2.],
[ 3., 3., 9., 3.],
[ 4., 4., 16., 4.],
[ 5., 5., 25., 5.],
[ 6., 6., 36., 6.],
[ 7., nan, 49., 7.],
[ 8., nan, 64., 8.],
[ 9., nan, 81., nan],
[ 10., nan, nan, nan]])
print(A[(~np.isnan(A)).cumsum(axis=0).argmax(axis=0), np.arange(A.shape[1])])
which yields
array([ 10., 6., 81., 8.])
Given the structured array, such as the one generated by
import numpy as np
with open('data.dat', "r") as data:
# per Padraic Cunningham's suggestion
A = np.genfromtxt("data.dat", names=True, delimiter = '\t')
I think the easiest way to obtain the desired result is to view the structured array as a plain NumPy array of dtype 'float':
B = A.view('float').reshape(A.shape[0], -1)
and then proceed as before:
print(B[(~np.isnan(B)).cumsum(axis=0).argmax(axis=0), np.arange(B.shape[1])])
How it works:
Given a plain NumPy array of dtype 'float', such as
In [357]: B
Out[357]:
array([[ nan, nan, nan, nan],
[ 1., 1., 1., 1.],
[ 2., 2., 4., 2.],
[ 3., 3., 9., 3.],
[ 4., 4., 16., 4.],
[ 5., 5., 25., 5.],
[ 6., 6., 36., 6.],
[ 7., nan, 49., 7.],
[ 8., nan, 64., 8.],
[ 9., nan, 81., nan],
[ 10., nan, nan, nan]])
we can use np.isnan
to find where the non-nan values are:
In [358]: ~np.isnan(B)
Out[358]:
array([[False, False, False, False],
[ True, True, True, True],
[ True, True, True, True],
[ True, True, True, True],
[ True, True, True, True],
[ True, True, True, True],
[ True, True, True, True],
[ True, False, True, True],
[ True, False, True, True],
[ True, False, True, False],
[ True, False, False, False]], dtype=bool)
Now we can use cumsum
to compute a cumulative sum for each column (False is treated as 0, True as 1):
In [359]: (~np.isnan(B)).cumsum(axis=0)
Out[359]:
array([[ 0, 0, 0, 0],
[ 1, 1, 1, 1],
[ 2, 2, 2, 2],
[ 3, 3, 3, 3],
[ 4, 4, 4, 4],
[ 5, 5, 5, 5],
[ 6, 6, 6, 6],
[ 7, 6, 7, 7],
[ 8, 6, 8, 8],
[ 9, 6, 9, 8],
[10, 6, 9, 8]])
Notice that the maximum value in each column is due to the value achieved by the last True in each column.
Therefore, we can find the index corresponding the the first occurrance of the maximum value in each column by using np.argmax
:
In [360]: (~np.isnan(B)).cumsum(axis=0).argmax(axis=0)
Out[360]: array([10, 6, 9, 8])
This gives the row index number for each column. To find the corresponding value in the array we could then use:
In [361]: B[(~np.isnan(B)).cumsum(axis=0).argmax(axis=0), np.arange(B.shape[1])]
Out[361]: array([ 10., 6., 81., 8.])
答案 1 :(得分:0)
You could try:
import numpy as np
ss = """1 1 1 1
2 2 4 2
3 3 9 3
4 4 16 4
5 5 25 5
6 6 36 6
7 nan 49 7
8 nan 64 8
9 nan 81 nan
10 nan nan nan"""
# build Matrix from string:
AA = np.array([[float(c) for c in rw.split()]
for rw in ss.splitlines()])
# extract last finte column element:
cl_finite = np.array([cl[np.isfinite(cl)][-1] for cl in AA.T])
print(cl_finite)
答案 2 :(得分:0)
This works fine, at least if all columns have a non-nan element.
In [36]: [l[~np.isnan(l)][-1] for l in A.transpose()]
Out[36]: [10.0, 6.0, 81.0, 8.0]