熊猫柱提取

时间:2018-10-14 15:12:27

标签: python pandas

我有一个电影的数据集,并被赋予一个 column 名称 actors 。我想创建一个新数据框,例如 Johnny Depp ,将日期为电影的电影放到该数据框中。 另外还有一个类型列,其元素为**动作|冒险|幻想|科幻 **。我想从中提取前两个单词,即<动作>冒险,并将它们存储在两个单独的列中。

   fun isConnectedToInternet(): Boolean {
        val connectivityManager = getSystemService(Context.CONNECTIVITY_SERVICE) as ConnectivityManager
        val activeNetwork = connectivityManager.activeNetworkInfo
        if (activeNetwork != null)
            return activeNetwork.isConnected
        else
            return false
    }

这是我为体裁编写的代码,但由于'str'对象没有属性'str'而出现错误

1 个答案:

答案 0 :(得分:0)

这行吗?

ll = [['Johnny Depp', 'a|b|c', 'Movie_1'],['Johnny Depp', 'a|d', 'Movie_2'],['Marlon Brando', 'f', 'Movie_3']]
movies = pd.DataFrame(ll,columns=['actors','genres','titles'])
print(movies)

# Get it as matrix of 0,1.
genre_df = movies.genres.str.get_dummies()
print(genre_df)

# Bonus: get a column containing list of first 2 genres.
genre_df['first_genre'] = pd.Series([''.join(genre_df.iloc[i,:][genre_df.iloc[i,:] == 1][0:1].index.tolist()) for i in range(len(genre_df))])
genre_df['second_genre'] = pd.Series([''.join(genre_df.iloc[i,:][genre_df.iloc[i,:] == 1][1:2].index.tolist()) for i in range(len(genre_df))])
genre_df['actors'] = movies['actors']
genre_df['titles'] = movies['titles']
print(genre_df)

# Get Depp movie info only.
depp_df = genre_df[genre_df['actors'] == 'Johnny Depp'][['first_genre', 'second_genre', 'titles']]
print(depp_df)

希望这是您想要的格式,我不太明白。