Question

我有一个很大的字典：

d[id1][id2] = value

示例：

books["auth1"]["humor"] = 20
books["auth1"]["action"] = 30
books["auth2"]["comedy"] = 20

依旧......

每个“auth”键可以具有与它们相关联的任何“类型”。键控项目的值是他们写的书籍数量。

现在我想要的是以矩阵的形式转换它......类似于：

                    "humor"       "action"        "comedy"
      "auth1"         20            30               0
      "auth2"          0            0                20

我该怎么做？感谢

Answer 1

pandas做得非常好：

books = {}
books["auth1"] = {}
books["auth2"] = {}
books["auth1"]["humor"] = 20
books["auth1"]["action"] = 30
books["auth2"]["comedy"] = 20

from pandas import *

df = DataFrame(books).T.fillna(0)

输出结果为：

       action  comedy  humor
auth1      30       0     20
auth2       0      20      0

Answer 2

使用列表推导将dict转换为列表和/或numpy数组：

np.array([[books[author][genre] for genre in sorted(books[author])] for author in sorted(books)])

修改的

显然，每个子词典中的键数都是不规则的。列出所有类型：

genres = ['humor', 'action', 'comedy']

然后以正常方式迭代字典：

list_of_lists = []
for author_name, author in sorted(books.items()):
    titles = []
    for genre in genres:
        try:
            titles.append(author[genre])
        except KeyError:
            titles.append(0)
    list_of_lists.append(titles)

books_array = numpy.array(list_of_lists)

基本上我试图将genres中每个键的值附加到列表中。如果密钥不存在，则会引发错误。我抓住了错误，然后将0添加到列表中。

Answer 3

2018年，我认为Pandas 0.22支持out of the box。具体来说，请检查from_dict的{{1}}类方法。

DataFrame

将2d字典转换为numpy矩阵

3 个答案: