我在电影df IMDB上遇到了麻烦。
清理后的数据帧看起来像这样。
popularity budget revenue original_title \
0 32.985763 150000000 1513528810 Jurassic World
1 28.419936 150000000 378436354 Mad Max: Fury Road
2 13.112507 110000000 295238201 Insurgent
cast director \
0 [Chris Pratt, Bryce Dallas Howard, Irrfan Khan... [Colin Trevorrow]
1 [Tom Hardy, Charlize Theron, Hugh Keays-Byrne,... [George Miller]
2 [Shailene Woodley, Theo James, Kate Winslet, A... [Robert Schwentke]
overview runtime \
0 Twenty-two years after the events of Jurassic ... 124
1 An apocalyptic story set in the furthest reach... 120
2 Beatrice Prior must confront her inner demons ... 119
genres release_date vote_count \
0 [Action, Adventure, Science Fiction, Thriller] 2015-06-09 5562
1 [Action, Adventure, Science Fiction, Thriller] 2015-05-13 6185
2 [Adventure, Science Fiction, Thriller] 2015-03-18 2480
vote_average release_year budget_adj revenue_adj
0 6.5 2015 1.379999e+08 1.392446e+09
1 7.1 2015 1.379999e+08 3.481613e+08
2 6.3 2015 1.012000e+08 2.716190e+08
“类型”列转换为每个条目的元素列表
目标是按年份对每种类型的计数进行分组。
类似这样的东西。
index count year
0 Action 106 2015
1 Adventure 69 2015
2 Science Fiction 84 2015
3 Thriller 171 2015
4 Fantasy 33 2015
5 Crime 51 2015
6 Western 6 2015
7 Drama 260 2015
8 Family 44 2015
9 Animation 37 2015
10 Comedy 160 2015
11 Mystery 42 2015
12 Romance 56 2015
13 War 9 2015
14 History 15 2015
15 Music 33 2015
16 Horror 125 2015
17 Documentary 51 2015
18 TV Movie 20 2015
要达到此目的,方法是:
df_year = df[df.release_date.dt.year == 2015]
list_flat = functools.reduce(operator.iconcat,list(df_year.genres.values), [])
df_years = pd.DataFrame(dict(Counter(list_flat)),range(1)).T
df_years['year'] = 2015
df_years.rename(columns={0:'count'},inplace=True)
df_years.reset_index(inplace=True)
但是我似乎无法在for循环中实现这一点,以便在所有年份中都做到这一点
df_years.append(df_years_temp,sort=False).reset_index(inplace=True)
我尝试将temp df附加到上面的主df上,但它返回相同的df,没有任何更改,并且未附加任何内容
这样做可以直观显示随着时间流逝的流派变化。
欢迎任何建议。
答案 0 :(得分:1)
只需将列表.explode
.groupby
分成更多行,然后使用.transform('count)
和df = pd.DataFrame({'index': {0: ['Action', ' Adventure', ' Science Fiction', ' Thriller'],
1: ['Action', ' Adventure', ' Science Fiction', ' Thriller'],
2: ['Adventure', ' Science Fiction', ' Thriller']},
'year': {0: 2015, 1: 2015, 2: 2015}})
用计数创建一个新列:
输入:
df = df.explode('index')
df['count'] = df.groupby('index')['index'].transform('count')
df
代码:
index year count
0 Action 2015 2
0 Adventure 2015 2
0 Science Fiction 2015 3
0 Thriller 2015 3
1 Action 2015 2
1 Adventure 2015 2
1 Science Fiction 2015 3
1 Thriller 2015 3
2 Adventure 2015 1
2 Science Fiction 2015 3
2 Thriller 2015 3
输出:
df_goals = df_goals.dropna(axis=0)