我使用包含以下信息的电影数据集:
df.head(10)
color director_name num_critic_for_reviews duration director_facebook_likes actor_3_facebook_likes actor_2_name actor_1_facebook_likes gross genres ... num_user_for_reviews language country content_rating budget title_year actor_2_facebook_likes imdb_score aspect_ratio movie_facebook_likes
0 Color James Cameron 723.0 178.0 0.0 855.0 Joel David Moore 1000.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi ... 3054.0 English USA PG-13 237000000.0 2009.0 936.0 7.9 1.78 33000
1 Color Gore Verbinski 302.0 169.0 563.0 1000.0 Orlando Bloom 40000.0 309404152.0 Action|Adventure|Fantasy ... 1238.0 English USA PG-13 300000000.0 2007.0 5000.0 7.1 2.35 0
2 Color Sam Mendes 602.0 148.0 0.0 161.0 Rory Kinnear 11000.0 200074175.0 Action|Adventure|Thriller ... 994.0 English UK PG-13 245000000.0 2015.0 393.0 6.8 2.35 85000
3 Color Christopher Nolan 813.0 164.0 22000.0 23000.0 Christian Bale 27000.0 448130642.0 Action|Thriller ... 2701.0 English USA PG-13 250000000.0 2012.0 23000.0 8.5 2.35 164000
4 NaN Doug Walker NaN NaN 131.0 NaN Rob Walker 131.0 NaN Documentary ... NaN NaN NaN NaN NaN NaN 12.0 7.1 NaN 0
5 Color Andrew Stanton 462.0 132.0 475.0 530.0 Samantha Morton 640.0 73058679.0 Action|Adventure|Sci-Fi ... 738.0 English USA PG-13 263700000.0 2012.0 632.0 6.6 2.35 24000
6 Color Sam Raimi 392.0 156.0 0.0 4000.0 James Franco 24000.0 336530303.0 Action|Adventure|Romance ... 1902.0 English USA PG-13 258000000.0 2007.0 11000.0 6.2 2.35 0
7 Color Nathan Greno 324.0 100.0 15.0 284.0 Donna Murphy 799.0 200807262.0 Adventure|Animation|Comedy|Family|Fantasy|Musi... ... 387.0 English USA PG 260000000.0 2010.0 553.0 7.8 1.85 29000
8 Color Joss Whedon 635.0 141.0 0.0 19000.0 Robert Downey Jr. 26000.0 458991599.0 Action|Adventure|Sci-Fi ... 1117.0 English USA PG-13 250000000.0 2015.0 21000.0 7.5 2.35 118000
9 Color David Yates 375.0 153.0 282.0 10000.0 Daniel Radcliffe 25000.0 301956980.0 Adventure|Family|Fantasy|Mystery ... 973.0 English UK PG 250000000.0 2009.0 11000.0 7.5 2.35 10000
df.columns
Index(['color', 'director_name', 'num_critic_for_reviews', 'duration',
'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',
'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',
'movie_title', 'num_voted_users', 'cast_total_facebook_likes',
'actor_3_name', 'facenumber_in_poster', 'plot_keywords',
'movie_imdb_link', 'num_user_for_reviews', 'language', 'country',
'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes',
'imdb_score', 'aspect_ratio', 'movie_facebook_likes'],
dtype='object')
Screenshot of the data (Notebook format)
我想用pivot函数重塑(不聚合)这些数据。
我试图用这段代码做到这一点:
pivoted = df.pivot( index= 'movie_title' , columns= [ 'director_name' , 'imdb_score' ] )
但是我得到了这个“ValueError”:
ValueError: all arrays must be same length
当我将代码更改为:
时pivoted = df.pivot( index= 'movie_title' )
我收到此错误:
ValueError: cannot label index with a null key
这里有什么问题?
非常感谢任何帮助。
进一步解释:
ValueError Traceback(最近一次调用最后一次) in() ----> 1 pivoted = df.pivot(index ='movie_title',columns = ['director_name',>'imdb_score'])
枢轴中的〜\ Anaconda3 \ lib \ site-packages \ pandas \ core \ frame.py(self,index,> columns,values) 4380“”“ 4381来自pandas.core.reshape.reshape导入数据透视表 - > 4382返回pivot(self,index = index,columns = columns,values = values) 4383 4384 _shared_docs ['pivot_table'] =“”“
枢轴中的〜\ Anaconda3 \ lib \ site-packages \ pandas \ core \ reshape \ reshape.py(self,> index,columns,values) 378 cols = [columns] if index is None else [index,columns] 379 append = index为None - > 380 indexed = self.set_index(cols,append = append) 381 return indexed.unstack(columns) 382否则:
set_index中的〜\ Anaconda3 \ lib \ site-packages \ pandas \ core \ frame.py(self,keys,> drop,append,inplace,verify_integrity) 3150 arrays.append(级别) 3151 - > 3152 index = _ensure_index_from_sequences(数组,名称) 3153 3154 if verify_integrity而不是index.is_unique:
〜\ Anaconda3 \ lib \ site-packages \ pandas \ core \ indexes \ base.py in> _ensure_index_from_sequences(序列,名称) 4150返回索引(序列[0],名称=名称) 4151其他: - > 4152返回MultiIndex.from_arrays(序列,名称=名称) 4153 4154
from_arrays中的〜\ Anaconda3 \ lib \ site-packages \ pandas \ core \ indexes \ multi.py(cls,> arrays,sortorder,names) 1144对于范围内的i(1,len(数组)): 1145 if len(arrays [i])!= len(arrays [i - 1]): - > 1146引发ValueError('所有数组必须长度相同') 1147 1148来自pandas.core.categorical import _factorize_from_iterables
ValueError:所有数组的长度必须相同