我想过滤掉NaN
值,并将其余行保留在Label
列中。
df
:
Timestamp Label
157505 2010-09-21 23:13:21.090 1
321498 2010-09-22 00:44:14.890 1
332687 2010-09-22 00:44:15.890 1
330028 2010-09-22 00:44:17.890 NaN
293410 2010-09-22 00:44:18.440 2
23093 2010-09-22 00:44:19.890 2
282054 2010-09-22 00:44:23.440 2
158381 2010-09-22 01:04:33.440 NaN
317397 2010-09-22 01:27:01.790 NaN
170770 2010-09-22 02:18:52.850 NaN
可复制的示例:
from pandas import *
import numpy as np
import pandas as pd
df = pd.DataFrame({'Timestamp': {157505: Timestamp('2010-09-21 23:13:21.090000'),
321498: Timestamp('2010-09-22 00:44:14.890000'),
332687: Timestamp('2010-09-22 00:44:15.890000'),
330028: Timestamp('2010-09-22 00:44:17.890000'),
293410: Timestamp('2010-09-22 00:44:18.440000'),
23093: Timestamp('2010-09-22 00:44:19.890000'),
282054: Timestamp('2010-09-22 00:44:23.440000'),
158381: Timestamp('2010-09-22 01:04:33.440000'),
317397: Timestamp('2010-09-22 01:27:01.790000'),
170770: Timestamp('2010-09-22 02:18:52.850000')},
'Label': {157505: 1,
321498: 1,
332687: 1,
330028: 'NaN',
293410: 2,
23093: 2,
282054: 2,
158381: 'NaN',
317397: 'NaN',
170770: 'NaN'}})
df
我尝试过:
df[df.Label.notnull()]
并获得完全相同的表:
Timestamp Label
157505 2010-09-21 23:13:21.090 1
321498 2010-09-22 00:44:14.890 1
332687 2010-09-22 00:44:15.890 1
330028 2010-09-22 00:44:17.890 NaN
293410 2010-09-22 00:44:18.440 2
23093 2010-09-22 00:44:19.890 2
282054 2010-09-22 00:44:23.440 2
158381 2010-09-22 01:04:33.440 NaN
317397 2010-09-22 01:27:01.790 NaN
170770 2010-09-22 02:18:52.850 NaN
怎么了?什么是最好的方法?
答案 0 :(得分:1)
请从dtype float
将Label转换为object
并使用notna()
或isna()
df=df[df.Label.astype(float).notna()]
print(df)
Timestamp Label
157505 2010-09-21 23:13:21.090 1.0
321498 2010-09-22 00:44:14.890 1.0
332687 2010-09-22 00:44:15.890 1.0
293410 2010-09-22 00:44:18.440 2.0
23093 2010-09-22 00:44:19.890 2.0
282054 2010-09-22 00:44:23.440 2.0
答案 1 :(得分:1)
您可以这样做:
df['Label'] = df['Label'].replace('NaN', np.nan)
df.dropna(inplace=True)
print(df)
或
df = df[df['Label'].notna()]
print(df)
Timestamp Label
157505 2010-09-21 23:13:21.090 1.0
321498 2010-09-22 00:44:14.890 1.0
332687 2010-09-22 00:44:15.890 1.0
293410 2010-09-22 00:44:18.440 2.0
23093 2010-09-22 00:44:19.890 2.0
282054 2010-09-22 00:44:23.440 2.0
答案 2 :(得分:1)
我了解您正在尝试过滤Nan值。 但是notnull()过滤器不会过滤字符串'NaN'。用np.nan替换它会得到您期望的结果。另外,您可以选择删除它。
from pandas import *
import numpy as np
import pandas as pd
df = pd.DataFrame({'Timestamp': {157505: Timestamp('2010-09-21 23:13:21.090000'),
321498: Timestamp('2010-09-22 00:44:14.890000'),
332687: Timestamp('2010-09-22 00:44:15.890000'),
330028: Timestamp('2010-09-22 00:44:17.890000'),
293410: Timestamp('2010-09-22 00:44:18.440000'),
23093: Timestamp('2010-09-22 00:44:19.890000'),
282054: Timestamp('2010-09-22 00:44:23.440000'),
158381: Timestamp('2010-09-22 01:04:33.440000'),
317397: Timestamp('2010-09-22 01:27:01.790000'),
170770: Timestamp('2010-09-22 02:18:52.850000')},
'Label': {157505: 1,
321498: 1,
332687: 1,
330028: np.nan,
293410: 2,
23093: 2,
282054: 2,
158381: np.nan,
317397: np.nan,
170770: np.nan}})
df[df.Label.notnull()]
将得到:
Timestamp Label
157505 2010-09-21 23:13:21.090 1.0
321498 2010-09-22 00:44:14.890 1.0
332687 2010-09-22 00:44:15.890 1.0
293410 2010-09-22 00:44:18.440 2.0
23093 2010-09-22 00:44:19.890 2.0
282054 2010-09-22 00:44:23.440 2.0
或
df.dropna()
它将给出相同的结果:
Timestamp Label
157505 2010-09-21 23:13:21.090 1.0
321498 2010-09-22 00:44:14.890 1.0
332687 2010-09-22 00:44:15.890 1.0
293410 2010-09-22 00:44:18.440 2.0
23093 2010-09-22 00:44:19.890 2.0
282054 2010-09-22 00:44:23.440 2.0