我要做的是,我有一个star_cast列表和一个电影实体的类型列表。我希望将此列表作为数据框中的重复实体进行融合,以便将其存储在数据库系统中。
director_name = ['chris','guy','bryan']
genre = [['mystery','thriller'],['comedy','crime'],['action','adventure','sci -fi']]
gross_vlaue = [2544,236544,265888]
imdb_ratings = [8.5,5.4,3.2]
metascores = [80.0,55.0,64.0]
movie_names = ['memento','snatch','x-men']
runtime = [113.0,102.0,104.0]
star_cast = [['abc','ced','gef'],['aaa','abc'],['act','cst','gst','hhs']]
votes = [200,2150,2350]
sample_data = pd.DataFrame({"movie_names":movie_names,
"imdb_ratings":imdb_ratings,
"metscores":metascores,
"votes":votes,
"runtime":runtime,
"genre":genre,
"director_name": director_name,
"star_cast": star_cast,
"gross_value":gross_vlaue
})
以上将生成我拥有的数据框样本。
director_name = ['chris','chris','chris','chris','chris','chris','guy','guy','guy','guy','bryan','bryan','bryan','bryan','bryan','bryan','bryan','bryan','bryan','bryan','bryan','bryan']
genre = ['mystery','thriller','mystery','thriller','mystery','thriller','comedy','crime','comedy','crime','action','adventure','sci -fi','action','adventure','sci -fi','action','adventure','sci -fi','action','adventure','sci -fi']
gross_vlaue = [2544,2544,2544,2544,2544,2544,236544,236544,236544,236544,265888,265888,265888,265888,265888,265888,265888,265888,265888,265888,265888,265888]
imdb_ratings = [8.5,8.5,8.5,8.5,8.5,8.5,5.4,5.4,5.4,5.4,3.2,3.2,3.2,3.2,3.2,3.2,3.2,3.2,3.2,3.2,3.2,3.2]
metascores = [80.0,80.0,80.0,80.0,80.0,80.0,55.0,55.0,55.0,55.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0]
movie_names = ['memento','memento','memento','memento','memento','memento','snatch','snatch','snatch','snatch','x-men','x-men','x-men','x-men','x-men','x-men','x-men','x-men','x-men','x-men','x-men','x-men']
runtime = [113.0,113.0,113.0,113.0,113.0,113.0,102.0,102.0,102.0,102.0,104.0,104.0,104.0,104.0,104.0,104.0,104.0,104.0,104.0,104.0,104.0,104.0]
star_cast = ['abc','ced','gef','abc','ced','gef','aaa','abc','aaa','abc','act','cst','gst','hhs','act','cst','gst','hhs','act','cst','gst','hhs']
votes = [200,200,200,200,200,200,2150,2150,2150,2150,2350,2350,2350,2350,2350,2350,2350,2350,2350,2350,2350,2350]
sample_result = pd.DataFrame({"movie_names":movie_names,
"imdb_ratings":imdb_ratings,
"metscores":metascores,
"votes":votes,
"runtime":runtime,
"genre":genre,
"director_name": director_name,
"star_cast": star_cast,
"gross_value":gross_vlaue
})
这将生成我想将数据转换为的格式。
我尝试使用融化(),但没有运气。请帮助,了解如何以有效的方式实现。我的数据集相当大,使用for循环将非常慢。还有其他解决办法吗?