如何使用Pivot函数重塑此DataFrame?

时间:2018-04-04 20:13:41

标签: python pandas dataframe dataset pivot

我使用包含以下信息的电影数据集:

df.head(10) 

color   director_name   num_critic_for_reviews  duration    director_facebook_likes actor_3_facebook_likes  actor_2_name    actor_1_facebook_likes  gross   genres  ... num_user_for_reviews    language    country content_rating  budget  title_year  actor_2_facebook_likes  imdb_score  aspect_ratio    movie_facebook_likes
0   Color   James Cameron   723.0   178.0   0.0 855.0   Joel David Moore    1000.0  760505847.0 Action|Adventure|Fantasy|Sci-Fi ... 3054.0  English USA PG-13   237000000.0 2009.0  936.0   7.9 1.78    33000
1   Color   Gore Verbinski  302.0   169.0   563.0   1000.0  Orlando Bloom   40000.0 309404152.0 Action|Adventure|Fantasy    ... 1238.0  English USA PG-13   300000000.0 2007.0  5000.0  7.1 2.35    0
2   Color   Sam Mendes  602.0   148.0   0.0 161.0   Rory Kinnear    11000.0 200074175.0 Action|Adventure|Thriller   ... 994.0   English UK  PG-13   245000000.0 2015.0  393.0   6.8 2.35    85000
3   Color   Christopher Nolan   813.0   164.0   22000.0 23000.0 Christian Bale  27000.0 448130642.0 Action|Thriller ... 2701.0  English USA PG-13   250000000.0 2012.0  23000.0 8.5 2.35    164000
4   NaN Doug Walker NaN NaN 131.0   NaN Rob Walker  131.0   NaN Documentary ... NaN NaN NaN NaN NaN NaN 12.0    7.1 NaN 0
5   Color   Andrew Stanton  462.0   132.0   475.0   530.0   Samantha Morton 640.0   73058679.0  Action|Adventure|Sci-Fi ... 738.0   English USA PG-13   263700000.0 2012.0  632.0   6.6 2.35    24000
6   Color   Sam Raimi   392.0   156.0   0.0 4000.0  James Franco    24000.0 336530303.0 Action|Adventure|Romance    ... 1902.0  English USA PG-13   258000000.0 2007.0  11000.0 6.2 2.35    0
7   Color   Nathan Greno    324.0   100.0   15.0    284.0   Donna Murphy    799.0   200807262.0 Adventure|Animation|Comedy|Family|Fantasy|Musi...   ... 387.0   English USA PG  260000000.0 2010.0  553.0   7.8 1.85    29000
8   Color   Joss Whedon 635.0   141.0   0.0 19000.0 Robert Downey Jr.   26000.0 458991599.0 Action|Adventure|Sci-Fi ... 1117.0  English USA PG-13   250000000.0 2015.0  21000.0 7.5 2.35    118000
9   Color   David Yates 375.0   153.0   282.0   10000.0 Daniel Radcliffe    25000.0 301956980.0 Adventure|Family|Fantasy|Mystery    ... 973.0   English UK  PG  250000000.0 2009.0  11000.0 7.5 2.35    10000

df.columns

Index(['color', 'director_name', 'num_critic_for_reviews', 'duration',
       'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',
       'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',
       'movie_title', 'num_voted_users', 'cast_total_facebook_likes',
       'actor_3_name', 'facenumber_in_poster', 'plot_keywords',
       'movie_imdb_link', 'num_user_for_reviews', 'language', 'country',
       'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes',
       'imdb_score', 'aspect_ratio', 'movie_facebook_likes'],
      dtype='object')

Screenshot of the data (Notebook format)

我想用pivot函数重塑(不聚合)这些数据。

我试图用这段代码做到这一点:

    pivoted = df.pivot( index= 'movie_title' , columns= [ 'director_name' , 'imdb_score' ] ) 

但是我得到了这个“ValueError”:

 ValueError: all arrays must be same length

当我将代码更改为:

pivoted = df.pivot( index= 'movie_title' ) 

我收到此错误:

ValueError: cannot label index with a null key

这里有什么问题?

非常感谢任何帮助。

进一步解释:

  

ValueError Traceback(最近一次调用最后一次)    in()   ----> 1 pivoted = df.pivot(index ='movie_title',columns = ['director_name',>'imdb_score'])

     枢轴中的

〜\ Anaconda3 \ lib \ site-packages \ pandas \ core \ frame.py(self,index,> columns,values)     4380“”“     4381来自pandas.core.reshape.reshape导入数据透视表    - > 4382返回pivot(self,index = index,columns = columns,values = values)     4383     4384 _shared_docs ['pivot_table'] =“”“

     枢轴中的

〜\ Anaconda3 \ lib \ site-packages \ pandas \ core \ reshape \ reshape.py(self,> index,columns,values)      378 cols = [columns] if index is None else [index,columns]      379 append = index为None    - > 380 indexed = self.set_index(cols,append = append)      381 return indexed.unstack(columns)      382否则:

     set_index中的

〜\ Anaconda3 \ lib \ site-packages \ pandas \ core \ frame.py(self,keys,> drop,append,inplace,verify_integrity)     3150 arrays.append(级别)     3151    - > 3152 index = _ensure_index_from_sequences(数组,名称)     3153     3154 if verify_integrity而不是index.is_unique:

     

〜\ Anaconda3 \ lib \ site-packages \ pandas \ core \ indexes \ base.py in> _ensure_index_from_sequences(序列,名称)     4150返回索引(序列[0],名称=名称)     4151其他:    - > 4152返回MultiIndex.from_arrays(序列,名称=名称)     4153     4154

     from_arrays中的

〜\ Anaconda3 \ lib \ site-packages \ pandas \ core \ indexes \ multi.py(cls,> arrays,sortorder,names)     1144对于范围内的i(1,len(数组)):     1145 if len(arrays [i])!= len(arrays [i - 1]):    - > 1146引发ValueError('所有数组必须长度相同')     1147     1148来自pandas.core.categorical import _factorize_from_iterables

     

ValueError:所有数组的长度必须相同

0 个答案:

没有答案