我有这样的df:
num1 num2
0 [2.0] 10
1 [3.0] 20
2 [4.0] 30
3 [5.0] 40
4 [6.0] 50
5 [nan] 60
6 [nan] 70
7 [10.0] 80
8 [nan] 90
9 [15.0] 100
num1
列包含浮点数组。 [nan]
是一个包含单个np.NaN
的numpy数组。
我通过this将其转换为整数:
df['num1'] = list(map(int, df['num1']))
如果我只使用这个df:
num1 num2
0 [2.0] 10
1 [3.0] 20
2 [4.0] 30
3 [5.0] 40
4 [6.0] 50
当没有[nan]
并且我得到:
num1 num2
0 2.0 10
1 3.0 20
2 4.0 30
3 5.0 40
4 6.0 50
但如果我在[nan]
中加入完整的df,我会收到错误:
`ValueError: cannot convert float NaN to integer`
我尝试过:
df[df['num1'] != np.array(np.NaN)]
但这给出了错误:
TypeError: len() of unsigned object
如何获得所需的输出:
num1 num2
0 2.0 10
1 3.0 20
2 4.0 30
3 5.0 40
4 6.0 50
5 10.0 80
6 15.0 100
答案 0 :(得分:2)
这应该摆脱所有nan
列表,只需添加以下内容:
df = df.loc[df['num1'].str[0].dropna().index]
然后您可以按原样运行其余代码。
答案 1 :(得分:0)
试试这个 -
df['num1'] = df['num1'].apply(lambda x: x[0]).dropna() # unlist the list of numbers (assuming you dont have multiple)
df['num1'] = list(map(int, df['num1'])) # map operation
print(df)
<强>输出强>
num1 num2
0 2 10
1 3 20
2 4 30
3 5 40
4 6 50
7 10 80
9 15 100
计时(取决于数据大小)
# My solution
# 2.6 ms ± 327 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# @O.Suleiman's solution
# 2.8 ms ± 457 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# @ Anton vBR's solution
# 2.96 ms ± 504 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
答案 2 :(得分:0)
df['num1'] = df.num1.str[0]
df.dropna(axis=0, inplace=True)
受suleiman回答启发但没有使用loc的解决方案 这是输出:
num1 num2
0 2.0 10
1 3.0 20
2 4.0 30
3 5.0 40
4 6.0 50
7 10.0 80
9 15.0 100
答案 3 :(得分:0)
您可以按照以下方式执行此操作:
# convert np array containing NaNs into np.NaN
df['num1']=df['num1'].apply(lambda x: np.nan if np.nan in x else x[0])
# use dropna to drop the rows
df=df['num1'].dropna()
print(df)
输出:
num1 num2
0 2.0 10
1 3.0 20
2 4.0 30
3 5.0 40
4 6.0 50
5 10.0 80
6 15.0 100
答案 4 :(得分:0)
正如您所看到的,有很多选择。您可以转换为数字,然后删除空值:
import pandas as pd
import numpy as np
data = dict(num1=[[2.0],[np.nan],['apple']])
df = pd.DataFrame(data)
m = pd.to_numeric(df['num1'].apply(lambda x: x[0]),errors='coerce').dropna().index
df = df.loc[m]