将值是不同长度列表的字典转换为数据框

时间:2019-03-21 18:22:29

标签: python pandas dataframe dictionary matrix

我有一个字典,键是年份,而值是相应的模型。下面是我从字典中打印出的一条数据。

1975: ['MODEL9808533471'], 
1985: ['MODEL0912768548'], 
1980: ['MODEL1006230072', 'MODEL7898438988'], 
1987: ['MODEL0848444339'], 
1977: ['MODEL7889395724'], 
1962: ['MODEL8686121468'], 
1965: ['MODEL0911532520'],  
2018: ['MODEL1712050002', 'MODEL1712050003', 'MODEL1712050004']

我想要的东西如下:

                 1962    1965    1975   1977   1980   1985  1987  2018
MODEL9808533471                    1
MODEL0912768548                                         1
MODEL1006230072                                  1
MODEL7898438988                                  1
MODEL0848444339                                               1
MODEL7889395724                           1
MODEL8686121468   1
MODEL0911532520            1
MODEL1712050002                                                     1
MODEL1712050003                                                     1
MODEL1712050004                                                     1

一开始,我认为我们需要循环字典的每个值并构建矩阵。然后,大熊猫将输出到一个csv文件。
我在numpy包中找不到类似的想法,尽管它对于处理矩阵很有效。我在我们的论坛中找到了this link,但列表的长度相同。

您知道有什么工具或设施(例如熊猫功能,numpy功能或类似功能)可以帮助我吗?

谢谢!

2 个答案:

答案 0 :(得分:3)

完全适合MultiLabelBinarizersklearn的用法

from sklearn.preprocessing import MultiLabelBinarizer
s = pd.Series(d)
mlb = MultiLabelBinarizer()
yourdf=pd.DataFrame(mlb.fit_transform(s),columns=mlb.classes_, index=s.index).T
yourdf
Out[121]: 
                 1975  1985  1980  1987  1977  1962  1965  2018
MODEL0848444339     0     0     0     1     0     0     0     0
MODEL0911532520     0     0     0     0     0     0     1     0
MODEL0912768548     0     1     0     0     0     0     0     0
MODEL1006230072     0     0     1     0     0     0     0     0
MODEL1712050002     0     0     0     0     0     0     0     1
MODEL1712050003     0     0     0     0     0     0     0     1
MODEL1712050004     0     0     0     0     0     0     0     1
MODEL7889395724     0     0     0     0     1     0     0     0
MODEL7898438988     0     0     1     0     0     0     0     0
MODEL8686121468     0     0     0     0     0     1     0     0
MODEL9808533471     1     0     0     0     0     0     0     0

get_dummies

s.apply(','.join).str.get_dummies(',').T
Out[127]: 
                 1975  1985  1980  1987  1977  1962  1965  2018
MODEL0848444339     0     0     0     1     0     0     0     0
MODEL0911532520     0     0     0     0     0     0     1     0
MODEL0912768548     0     1     0     0     0     0     0     0
MODEL1006230072     0     0     1     0     0     0     0     0
MODEL1712050002     0     0     0     0     0     0     0     1
MODEL1712050003     0     0     0     0     0     0     0     1
MODEL1712050004     0     0     0     0     0     0     0     1
MODEL7889395724     0     0     0     0     1     0     0     0
MODEL7898438988     0     0     1     0     0     0     0     0
MODEL8686121468     0     0     0     0     0     1     0     0
MODEL9808533471     1     0     0     0     0     0     0     0

答案 1 :(得分:1)

您可以stackcrosstab

假设d是您的字典,

df = pd.DataFrame(d.values(), index=d.keys()).stack().reset_index(level=0)

df.columns = ['year', 'col']

pd.crosstab(df['col'], df['year'])


year            1962    1965    1975    1977    1980    1985    1987    2018
col                             
MODEL0848444339 0       0       0       0       0       0       1       0
MODEL0911532520 0       1       0       0       0       0       0       0
MODEL0912768548 0       0       0       0       0       1       0       0
MODEL1006230072 0       0       0       0       1       0       0       0
MODEL1712050002 0       0       0       0       0       0       0       1
MODEL1712050003 0       0       0       0       0       0       0       1
MODEL1712050004 0       0       0       0       0       0       0       1
MODEL7889395724 0       0       0       1       0       0       0       0
MODEL7898438988 0       0       0       0       1       0       0       0
MODEL8686121468 1       0       0       0       0       0       0       0
MODEL9808533471 0       0       1       0       0       0       0       0