Question

我有以下使用大熊猫导入的CSV数据框（数值是距离）

Forest,Bell Bay,Surrey Hills,Smithton,Hobart
Coupe 1,158,194,10,49
Coupe 2,156,169,71,84
Coupe 3,10,186,101,163
Coupe 4,47,94,134,139
Coupe 5,144,61,135,56
Coupe 6,27,27,134,36
Coupe 7,114,4,143,113
Coupe 8,71,170,190,140
Coupe 9,94,54,73,128
Coupe 10,46,194,92,36

通过使用以下代码

df= pd.read_csv("Example.csv", header=0, index_col="Forest")

我创建了一个我使用的森林列表：

I = df.index.tolist()

结果：

['Coupe 1', 'Coupe 2', 'Coupe 3', 'Coupe 4', 'Coupe 5', 'Coupe 6', 'Coupe 7', 'Coupe 8', 'Coupe 9', 'Coupe 10']

使用以下命令列出目的地J：

J = df.columns.values.tolist()

结果：

['Bell Bay', 'Surrey Hills', 'Smithton', 'Hobart']

元组（弧）的列表是使用以下方法创建的：

arcs = [(i, j) for i in I for j in J]

结果：

[('Coupe 1', 'Bell Bay'), ('Coupe 1', 'Surrey Hills'), ('Coupe 1', 'Smithton'), ('Coupe 1', 'Hobart'), ('Coupe 2', 'Bell Bay'), ('Coupe 2', 'Surrey Hills'), ('Coupe 2', 'Smithton'), ('Coupe 2', 'Hobart'), ('Coupe 3', 'Bell Bay'), ('Coupe 3', 'Surrey Hills'), ('Coupe 3', 'Smithton'), ('Coupe 3', 'Hobart'), ('Coupe 4', 'Bell Bay'), ('Coupe 4', 'Surrey Hills'), ('Coupe 4', 'Smithton'), ('Coupe 4', 'Hobart'), ('Coupe 5', 'Bell Bay'), ('Coupe 5', 'Surrey Hills'), ('Coupe 5', 'Smithton'), ('Coupe 5', 'Hobart'), ('Coupe 6', 'Bell Bay'), ('Coupe 6', 'Surrey Hills'), ('Coupe 6', 'Smithton'), ('Coupe 6', 'Hobart'), ('Coupe 7', 'Bell Bay'), ('Coupe 7', 'Surrey Hills'), ('Coupe 7', 'Smithton'), ('Coupe 7', 'Hobart'), ('Coupe 8', 'Bell Bay'), ('Coupe 8', 'Surrey Hills'), ('Coupe 8', 'Smithton'), ('Coupe 8', 'Hobart'), ('Coupe 9', 'Bell Bay'), ('Coupe 9', 'Surrey Hills'), ('Coupe 9', 'Smithton'), ('Coupe 9', 'Hobart'), ('Coupe 10', 'Bell Bay'), ('Coupe 10', 'Surrey Hills'), ('Coupe 10', 'Smithton'), ('Coupe 10', 'Hobart')]

接下来，我要创建以下类型的弧和距离值的字典：

{('Coupe 1', 'Bell Bay'): 158, ('Coupe 1', 'Surrey Hills'):194, .....}

有人可以建议最好的方法来编写这本词典吗？这只是组合矩阵中的一小部分I（10）和J（4）。我的方法必须适用于具有超过一千万个I * J组合的超大型数据集。帮助将不胜感激！

Answer 1

首先将DataFrame.stack用于MultiIndex，然后通过Series.to_dict转换为字典：

d = df.stack().to_dict()

print (d)
{('Coupe 1', 'Bell Bay'): 158, ('Coupe 1', 'Surrey Hills'): 194, ('Coupe 1', 'Smithton'): 10, ('Coupe 1', 'Hobart'): 49, ('Coupe 2', 'Bell Bay'): 156, ('Coupe 2', 'Surrey Hills'): 169, ('Coupe 2', 'Smithton'): 71, ('Coupe 2', 'Hobart'): 84, ('Coupe 3', 'Bell Bay'): 10, ('Coupe 3', 'Surrey Hills'): 186, ('Coupe 3', 'Smithton'): 101, ('Coupe 3', 'Hobart'): 163, ('Coupe 4', 'Bell Bay'): 47, ('Coupe 4', 'Surrey Hills'): 94, ('Coupe 4', 'Smithton'): 134, ('Coupe 4', 'Hobart'): 139, ('Coupe 5', 'Bell Bay'): 144, ('Coupe 5', 'Surrey Hills'): 61, ('Coupe 5', 'Smithton'): 135, ('Coupe 5', 'Hobart'): 56, ('Coupe 6', 'Bell Bay'): 27, ('Coupe 6', 'Surrey Hills'): 27, ('Coupe 6', 'Smithton'): 134, ('Coupe 6', 'Hobart'): 36, ('Coupe 7', 'Bell Bay'): 114, ('Coupe 7', 'Surrey Hills'): 4, ('Coupe 7', 'Smithton'): 143, ('Coupe 7', 'Hobart'): 113, ('Coupe 8', 'Bell Bay'): 71, ('Coupe 8', 'Surrey Hills'): 170, ('Coupe 8', 'Smithton'): 190, ('Coupe 8', 'Hobart'): 140, ('Coupe 9', 'Bell Bay'): 94, ('Coupe 9', 'Surrey Hills'): 54, ('Coupe 9', 'Smithton'): 73, ('Coupe 9', 'Hobart'): 128, ('Coupe 10', 'Bell Bay'): 46, ('Coupe 10', 'Surrey Hills'): 194, ('Coupe 10', 'Smithton'): 92, ('Coupe 10', 'Hobart'): 36}

您可以通过DataFrame.loc的字典理解来解决您的问题：

I = df.index.tolist()
J = df.columns.values.tolist()

arcs = {(i, j):df.loc[i, j] for i in I for j in J}

Answer 2

建议是遍历

中的所有元组

arcs = [(i, j) for i in I for j in J]

并使用pandas DataFrame的loc方法访问每个值

dictionary = {}
for forest_tuple in arcs:
    dictionary[(arcs[0], arcs[1])] = df.loc[arcs[0], arcs[1]]

哪个会返回您想要的字典？

Answer 3

不确定这种方法是否适用于1000万以上的条目或是否足够快，但是您可以尝试以下方法：

dict = {}
for combination in arcs:
    dict[combination] = df.loc[combination[0], combination[1]]

print(dict)

根据矩阵数据框创建字典

3 个答案: