来自dataFrame中嵌入列表的DataFrame动态列

时间:2017-12-08 21:52:25

标签: python list pandas dataframe

好的,所以我是Python的亲戚。我需要转换以下数据框

bd,日期

[[None]], 2017-11-01 09:00:00

[[Sulphur], [Green Tea]], 2017-11-02 09:00:00 

[[Green Tea], [Jasmine]], 2017-11-03 09:00:00 

.....

转换为

约会,无,硫磺,绿茶,茉莉花......

2017-11-01 09:00:00, 1, 0, 0, 0...

2017-11-02 09:00:00, 0, 1, 1, 0...

2017-11-03 09:00:00, 0, 0, 1, 1...

BD列中嵌入列表中的项目是动态的,不能是新dataFrame中的预定义列。

我通过另一篇有用的帖子Create new columns in pandas from python nested lists尝试了以下内容,但无法成功修改

suppDF1 = suppDF.bd.apply(lambda x: pd.Series(1, x)).fillna(0).astype(int)

使用上面的代码我只看到5列不正确的1,所以我显然不在我的深度。

更新

我尝试了Max的建议,但我想在尝试使用数据透视时可能会出现错误:

suppDF1 = suppDF.pivot(index="date", columns="bd")["bd"]

我收到以下错误

unhashable type: 'list'

1 个答案:

答案 0 :(得分:0)

我确信有更多优雅,功能性,pythonic的方式来做到这一点......我很想知道它们是什么。

import numpy as np
import pandas as pd

# define dataframe
df = pd.DataFrame(columns = ['bd', 'date'])
df.loc[0, 'bd'] = [[None]]
df.loc[0, 'date'] = '2017-11-01 09:00:00'
df.loc[1, 'bd'] = [['Sulphur'], ['Green Tea']]
df.loc[1, 'date'] = '2017-11-02 09:00:00'
df.loc[2, 'bd'] = [['Green Tea'], ['Jasmine']]
df.loc[2, 'date'] = '2017-11-03 09:00:00'
print(df)

# set the index
df.set_index('date', inplace = True)

# df['bd'] contains doubly nested lists
# for item in column, for list in item, for string in list, add string to list
cols = []
for ls2 in df['bd']:
    for ls1 in ls2:
        for string in ls1:
            if string not in cols:
                cols.append(string)

# make a column for every string in df['bd']
for tea in cols:
    df[tea] = 0

# manual one-hot encoding; couldn't get pd.get_dummies() to work
for row in df.iterrows():
    for ls in row[1][0]:
        for el in ls:
            if el in df.columns:
                df.loc[row[0], el] = 1
df.drop('bd', axis = 1, inplace = True)
df.fillna(0)

我花了一些时间在这上面;这里有一些不完全有用的东西:

我无法让这个递归功能为我工作(怪我,不是功能)...... Flatten (an irregular) list of lists

我尝试了get_dummies,但它不能散列列表,更不用说双重嵌套列表...... https://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html

我尝试过pivoting和pivot_table ...... https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot.html

我尝试将列表转换为字符串,但最终还是死路一条...... Converting a Panda DF List into a string