Python pandas - value_counts无法正常工作

时间:2015-12-04 13:06:17

标签: python pandas

基于堆栈上的this帖子,我尝试了像这样的值计数功能

df2 = df1.join(df1.genres.str.split(",").apply(pd.value_counts).fillna(0))

除了我的数据有22种独特的流派以及分割后得到42个值之外,它的工作正常,这当然不是唯一的。 数据示例:

     Action  Adventure   Casual  Design & Illustration   Early Access    Education   Free to Play    Indie   Massively Multiplayer   Photo Editing   RPG     Racing  Simulation  Software Training   Sports  Strategy    Utilities   Video Production    Web Publishing Accounting  Action  Adventure   Animation & Modeling    Audio Production    Casual  Design & Illustration   Early Access    Education   Free to Play    Indie   Massively Multiplayer   Photo Editing   RPG Racing  Simulation  Software Training   Sports  Strategy    Utilities   Video Production    Web Publishing  nan
0   nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 1.0 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan

(我已粘贴头部和第一行)

我感觉问题是由我的原始数据引起的。好吧,我的专栏(类型)是包含括号的列表列表

示例:[Action,Indie] 所以当python读取它时,它会将[Action and Action and Action]读作不同的值,输出是303个不同的值。 所以我做的是:

for i in df1['genres'].tolist():
if str(i) != 'nan':

    i = i[1:-1]
    new.append(i)
else:
    new.append('nan')

1 个答案:

答案 0 :(得分:1)

您必须按功能str.strip从列[]中删除第一个和最后一个genres,然后按函数str.replace将空格替换为空字符串

import pandas as pd

df = pd.read_csv('test/Copy of AppCrawler.csv', sep="\t")


df['genres'] = df['genres'].str.strip('[]')
df['genres'] = df['genres'].str.replace(' ', '')

df = df.join(df.genres.str.split(",").apply(pd.value_counts).fillna(0))

#temporaly display 30 rows and 60 columns
with pd.option_context('display.max_rows', 30, 'display.max_columns', 60):
    print df
    #remove for clarity
print df.columns
Index([u'Unnamed: 0', u'appid', u'currency', u'final_price', u'genres',
       u'initial_price', u'is_free', u'metacritic', u'release_date',
       u'Accounting', u'Action', u'Adventure', u'Animation&Modeling',
       u'AudioProduction', u'Casual', u'Design&Illustration', u'EarlyAccess',
       u'Education', u'FreetoPlay', u'Indie', u'MassivelyMultiplayer',
       u'PhotoEditing', u'RPG', u'Racing', u'Simulation', u'SoftwareTraining',
       u'Sports', u'Strategy', u'Utilities', u'VideoProduction',
       u'WebPublishing'],
      dtype='object')