如何删除包含NaN数组的行?

时间:2018-02-22 12:57:02

标签: python pandas nan

我有这样的df:

   num1    num2
0  [2.0]   10
1  [3.0]   20
2  [4.0]   30
3  [5.0]   40
4  [6.0]   50
5  [nan]   60 
6  [nan]   70
7  [10.0]  80
8  [nan]   90
9  [15.0]  100

num1列包含浮点数组。 [nan]是一个包含单个np.NaN的numpy数组。

我通过this将其转换为整数:

df['num1'] = list(map(int, df['num1']))

如果我只使用这个df:

   num1    num2
0  [2.0]   10
1  [3.0]   20
2  [4.0]   30
3  [5.0]   40
4  [6.0]   50

当没有[nan]并且我得到:

时,这是有效的
   num1   num2
0  2.0  10
1  3.0  20
2  4.0  30
3  5.0  40
4  6.0  50

但如果我在[nan]中加入完整的df,我会收到错误:

`ValueError: cannot convert float NaN to integer`

我尝试过:

df[df['num1'] != np.array(np.NaN)]

但这给出了错误:

TypeError: len() of unsigned object  

如何获得所需的输出:

   num1    num2
0  2.0   10
1  3.0   20
2  4.0   30
3  5.0   40
4  6.0   50
5  10.0  80
6  15.0  100

5 个答案:

答案 0 :(得分:2)

这应该摆脱所有nan列表,只需添加以下内容:

df = df.loc[df['num1'].str[0].dropna().index]

然后您可以按原样运行其余代码。

答案 1 :(得分:0)

试试这个 -

df['num1'] = df['num1'].apply(lambda x: x[0]).dropna() # unlist the list of numbers (assuming you dont have multiple)
df['num1'] = list(map(int, df['num1'])) # map operation
print(df)

<强>输出

   num1  num2
0     2    10
1     3    20
2     4    30
3     5    40
4     6    50
7    10    80
9    15   100

计时(取决于数据大小)

# My solution
# 2.6 ms ± 327 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# @O.Suleiman's solution
# 2.8 ms ± 457 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# @ Anton vBR's solution
# 2.96 ms ± 504 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

答案 2 :(得分:0)

df['num1'] = df.num1.str[0]
df.dropna(axis=0, inplace=True)

suleiman回答启发但没有使用loc的解决方案 这是输出:

num1    num2
0   2.0 10
1   3.0 20
2   4.0 30
3   5.0 40
4   6.0 50
7   10.0    80
9   15.0    100

答案 3 :(得分:0)

您可以按照以下方式执行此操作:

# convert np array containing NaNs into np.NaN
df['num1']=df['num1'].apply(lambda x: np.nan if np.nan in x else x[0])

# use dropna to drop the rows
df=df['num1'].dropna()
print(df)

输出:

   num1    num2
0  2.0   10
1  3.0   20
2  4.0   30
3  5.0   40
4  6.0   50
5  10.0  80
6  15.0  100

答案 4 :(得分:0)

正如您所看到的,有很多选择。您可以转换为数字,然后删除空值:

import pandas as pd
import numpy as np

data = dict(num1=[[2.0],[np.nan],['apple']])

df = pd.DataFrame(data)

m = pd.to_numeric(df['num1'].apply(lambda x: x[0]),errors='coerce').dropna().index

df = df.loc[m]