从列表列表创建pandas数据框,但有不同的分隔符

时间:2017-10-15 17:44:54

标签: python pandas dataframe

我有一份清单清单:

     [['1', 'Toy Story (1995)', "Animation|Children's|Comedy"],
     ['2', 'Jumanji (1995)', "Adventure|Children's|Fantasy"],
     ['3', 'Grumpier Old Men (1995)', 'Comedy|Romance']]

我想最终得到一个包含这些列的pandas数据帧。

cols = ['MovieID', 'Name', 'Year', 'Adventure', 'Children', 'Comedy', 'Fantasy', 'Romance']

对于'Adventure', 'Children', 'Comedy', 'Fantasy', 'Romance'列,数据将为1或0.

我试过了:

for row in movies_list:
    for element in row:
        if '|' in element:
            element = element.split('|')

然而原始列表没有任何反应..完全被困在这里。

2 个答案:

答案 0 :(得分:4)

DataFrame使用L = [['1', 'Toy Story (1995)', "Animation|Children's|Comedy"], ['2', 'Jumanji (1995)', "Adventure|Children's|Fantasy"], ['3', 'Grumpier Old Men (1995)', 'Comedy|Romance']] df = pd.DataFrame(L, columns=['MovieID','Name','Data']) df1 = df['Data'].str.get_dummies() print (df1) Adventure Animation Children's Comedy Fantasy Romance 0 0 1 1 1 0 0 1 1 0 1 0 1 0 2 0 0 0 1 0 1 构造函数:

Name

对于列Year)需要str.get_dummiessplit来删除尾随Yearint也会转换为df[['Name','Year']] = df['Name'].str.split('\s\(', expand=True) df['Year'] = df['Year'].str.rstrip(')').astype(int) }第

Data

上一次删除列df1并将df = df.drop('Data', axis=1).join(df1) print (df) MovieID Name Year Adventure Animation Children's Comedy \ 0 1 Toy Story 1995 0 1 1 1 1 2 Jumanji 1995 1 0 1 0 2 3 Grumpier Old Men 1995 0 0 0 1 Fantasy Romance 0 0 0 1 1 0 2 0 1 添加到原始rstrip

var addItemButton: UIButton {

答案 1 :(得分:1)

这是我的版本,对于一行答案还不够好,但希望它可以帮到你!

import pandas as pd
import numpy as np

data = [['1', 'Toy Story (1995)', "Animation|Children's|Comedy"],
     ['2', 'Jumanji (1995)', "Adventure|Children's|Fantasy"],
     ['3', 'Grumpier Old Men (1995)', 'Comedy|Romance']]
cols = ['MovieID', 'Name', 'Year', 'Adventure', 'Children', 'Comedy', 'Fantasy', 'Romance']
final = []
for x in data:
    output = []
    output.append(x[0])
    output.append(x[1].split("(")[0].lstrip().rstrip())
    output.append(x[1].split("(")[1][:4])
    for h in ['Adventure', 'Children', 'Comedy', 'Fantasy', 'Romance']:
        output.append(h in x[2])
    final.append(output)

df = pd.DataFrame(final, columns=cols)
print(df)

<强>输出:

  MovieID              Name  Year  Adventure  Children  Comedy  Fantasy  \
0       1         Toy Story  1995      False      True    True    False   
1       2           Jumanji  1995       True      True   False     True   
2       3  Grumpier Old Men  1995      False     False    True    False   

   Romance  
0    False  
1    False  
2     True