将列中的列表熔合到数据框

时间:2018-02-23 18:18:06

标签: python-3.x dataframe jupyter-notebook

我要做的是,我有一个star_cast列表和一个电影实体的类型列表。我希望将此列表作为数据框中的重复实体进行融合,以便将其存储在数据库系统中。

director_name = ['chris','guy','bryan']
genre = [['mystery','thriller'],['comedy','crime'],['action','adventure','sci -fi']]
gross_vlaue = [2544,236544,265888]
imdb_ratings = [8.5,5.4,3.2]
metascores = [80.0,55.0,64.0]
movie_names = ['memento','snatch','x-men']
runtime = [113.0,102.0,104.0]
star_cast = [['abc','ced','gef'],['aaa','abc'],['act','cst','gst','hhs']]
votes = [200,2150,2350]

sample_data = pd.DataFrame({"movie_names":movie_names,
                        "imdb_ratings":imdb_ratings,
                        "metscores":metascores,
                        "votes":votes,
                        "runtime":runtime,
                        "genre":genre,
                        "director_name": director_name,
                        "star_cast": star_cast,
                        "gross_value":gross_vlaue
                       })

以上将生成我拥有的数据框样本。

director_name = ['chris','chris','chris','chris','chris','chris','guy','guy','guy','guy','bryan','bryan','bryan','bryan','bryan','bryan','bryan','bryan','bryan','bryan','bryan','bryan']
genre = ['mystery','thriller','mystery','thriller','mystery','thriller','comedy','crime','comedy','crime','action','adventure','sci -fi','action','adventure','sci -fi','action','adventure','sci -fi','action','adventure','sci -fi']
gross_vlaue = [2544,2544,2544,2544,2544,2544,236544,236544,236544,236544,265888,265888,265888,265888,265888,265888,265888,265888,265888,265888,265888,265888]
imdb_ratings = [8.5,8.5,8.5,8.5,8.5,8.5,5.4,5.4,5.4,5.4,3.2,3.2,3.2,3.2,3.2,3.2,3.2,3.2,3.2,3.2,3.2,3.2]
metascores = [80.0,80.0,80.0,80.0,80.0,80.0,55.0,55.0,55.0,55.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0,64.0]
movie_names = ['memento','memento','memento','memento','memento','memento','snatch','snatch','snatch','snatch','x-men','x-men','x-men','x-men','x-men','x-men','x-men','x-men','x-men','x-men','x-men','x-men']
runtime = [113.0,113.0,113.0,113.0,113.0,113.0,102.0,102.0,102.0,102.0,104.0,104.0,104.0,104.0,104.0,104.0,104.0,104.0,104.0,104.0,104.0,104.0]
star_cast = ['abc','ced','gef','abc','ced','gef','aaa','abc','aaa','abc','act','cst','gst','hhs','act','cst','gst','hhs','act','cst','gst','hhs']
votes = [200,200,200,200,200,200,2150,2150,2150,2150,2350,2350,2350,2350,2350,2350,2350,2350,2350,2350,2350,2350]

sample_result = pd.DataFrame({"movie_names":movie_names,
                        "imdb_ratings":imdb_ratings,
                        "metscores":metascores,
                        "votes":votes,
                        "runtime":runtime,
                        "genre":genre,
                        "director_name": director_name,
                        "star_cast": star_cast,
                        "gross_value":gross_vlaue
                       })

这将生成我想将数据转换为的格式。

我尝试使用融化(),但没有运气。请帮助,了解如何以有效的方式实现。我的数据集相当大,使用for循环将非常慢。还有其他解决办法吗?

0 个答案:

没有答案