好的,所以我是Python的亲戚。我需要转换以下数据框
bd,日期
[[None]], 2017-11-01 09:00:00
[[Sulphur], [Green Tea]], 2017-11-02 09:00:00
[[Green Tea], [Jasmine]], 2017-11-03 09:00:00
.....
转换为
约会,无,硫磺,绿茶,茉莉花......
2017-11-01 09:00:00, 1, 0, 0, 0...
2017-11-02 09:00:00, 0, 1, 1, 0...
2017-11-03 09:00:00, 0, 0, 1, 1...
BD列中嵌入列表中的项目是动态的,不能是新dataFrame中的预定义列。
我通过另一篇有用的帖子Create new columns in pandas from python nested lists尝试了以下内容,但无法成功修改
suppDF1 = suppDF.bd.apply(lambda x: pd.Series(1, x)).fillna(0).astype(int)
使用上面的代码我只看到5列不正确的1,所以我显然不在我的深度。
更新
我尝试了Max的建议,但我想在尝试使用数据透视时可能会出现错误:
suppDF1 = suppDF.pivot(index="date", columns="bd")["bd"]
我收到以下错误
unhashable type: 'list'
答案 0 :(得分:0)
我确信有更多优雅,功能性,pythonic的方式来做到这一点......我很想知道它们是什么。
import numpy as np
import pandas as pd
# define dataframe
df = pd.DataFrame(columns = ['bd', 'date'])
df.loc[0, 'bd'] = [[None]]
df.loc[0, 'date'] = '2017-11-01 09:00:00'
df.loc[1, 'bd'] = [['Sulphur'], ['Green Tea']]
df.loc[1, 'date'] = '2017-11-02 09:00:00'
df.loc[2, 'bd'] = [['Green Tea'], ['Jasmine']]
df.loc[2, 'date'] = '2017-11-03 09:00:00'
print(df)
# set the index
df.set_index('date', inplace = True)
# df['bd'] contains doubly nested lists
# for item in column, for list in item, for string in list, add string to list
cols = []
for ls2 in df['bd']:
for ls1 in ls2:
for string in ls1:
if string not in cols:
cols.append(string)
# make a column for every string in df['bd']
for tea in cols:
df[tea] = 0
# manual one-hot encoding; couldn't get pd.get_dummies() to work
for row in df.iterrows():
for ls in row[1][0]:
for el in ls:
if el in df.columns:
df.loc[row[0], el] = 1
df.drop('bd', axis = 1, inplace = True)
df.fillna(0)
我花了一些时间在这上面;这里有一些不完全有用的东西:
我无法让这个递归功能为我工作(怪我,不是功能)...... Flatten (an irregular) list of lists
我尝试了get_dummies,但它不能散列列表,更不用说双重嵌套列表...... https://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html
我尝试过pivoting和pivot_table ...... https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot.html
我尝试将列表转换为字符串,但最终还是死路一条...... Converting a Panda DF List into a string