从数据框中仅选择nan列

时间:2017-10-28 01:07:50

标签: python dataframe

movie_rating_T.iloc [:,5:6]

critic  Toby
title   
Just My Luck    NaN
Lady in the Water   NaN
Snakes on a Plane   4.5
Superman Returns    4.0
The Night Listener  NaN
You Me and Dupree   1.0

假设我只想选择Nan列

Just My Luck
Lady in the Water
The Night Listener

如何使用dataframe nan仅提取nan值?

critic  Toby
title   
Just My Luck    NaN
Lady in the Water   NaN
The Night Listener  NaN

。['title']无效

=============================================== ================ movie_rating_T.iloc [:,5:6]

critic  Toby
title   
Just My Luck    NaN
Lady in the Water   NaN
Snakes on a Plane   4.5
Superman Returns    4.0
The Night Listener  NaN
You Me and Dupree   1.0

df_MovieRatingT [df_MovieRatingT [ '托比']。ISNULL()]

critic  Toby
title   
Just My Luck    NaN
Lady in the Water   NaN
The Night Listener  NaN

=============================================== =============== df = DataFrame(评级)

    critic  title   rating
0   Jack Matthews   Lady in the Water   3.0
1   Jack Matthews   Snakes on a Plane   4.0
2   Jack Matthews   You Me and Dupree   3.5
3   Jack Matthews   Superman Returns    5.0

我想成功

critic  Claudia Puig    Gene Seymour    Jack Matthews   Lisa Rose   Mick LaSalle    Toby
title                       
Just My Luck    3.0 1.5 NaN 3.0 2.0 NaN
Lady in the Water   NaN 3.0 3.0 2.5 3.0 NaN
Snakes on a Plane   3.5 3.5 4.0 3.5 4.0 4.5
Superman Returns    4.0 5.0 5.0 3.5 3.0 4.0
The Night Listener  4.5 3.0 3.0 3.0 3.0 NaN
You Me and Dupree   2.5 3.5 3.5 2.5 2.0 1.0

我用过

movie_rating= ratings.pivot(index='critic', columns='title',values='rating')

但它在同一专栏创建了标题和评论家。 如何解决?

1 个答案:

答案 0 :(得分:1)

您可以使用isnull

来使用pandas
df[df['You column with NaN'].isnull()]

这将返回具有NaN

的行
df2 = df[df['You column with NaN'].isnull()]['Title']

将返回您想要的内容,

一个例子:

import pandas as pd
import numpy as np

df = pd.DataFrame([range(3), [0, np.NaN, np.NaN], [0, 0, np.NaN], range(3), range(3)], columns=["Col_1", "Col_2", "Col_3"])
print df

   Col_1  Col_2  Col_3
0     0   1.0   2.0
1     0   NaN   NaN
2     0   0.0   NaN
3     0   1.0   2.0
4     0   1.0   2.0

print df[df['Col_3'].isnull()]
   Col_1  Col_2  Col_3
1     0   NaN   NaN
2     0   0.0   NaN
df2 =df[df['Col_3'].isnull()]['Col_2']
print df2
1    NaN
2    0.0
Name: Col_2, dtype: float64

修改

我现在遇到了你的问题,主要问题是数据框本身。使用pivot时,column参数错误...

但您不需要解决此问题。

如果我没错,现在你只需要评论家和电影,而不是评级本身。

df_Toby = df.loc[df['critic'] == 'Toby']

这个df ['crit'] =='Toby'将选择所有具有评论名称的行

要返回标题,您可以选择“标题”列

df_Toby = df_Toby['title']

将标题和评级分组

df_Toby = df_Toby[['title', 'rating']]

你可以在那之后使用

exclude_Nan_df_Toby = df_Toby.dropna()

这将排除所有具有NaN的行,并仅返回具有有效评级的行。

干杯,