Question

我有一个与此类似的样本数据集：丰田凯美瑞丰田阿瓦隆本田思域本田雅阁大众帕萨特大众捷达

在此数据集中，我需要计算第一列和第二列中有多少个唯一值。然后，对于每个唯一值，有多少个值与第二列的值相关联。示例-丰田与Camby和Avalon连接。我需要创建一个邻接m * n矩阵，其中m =第一列中的唯一值数量，n =第二列中的唯一值数量。我的最终输出看起来像这样：

    Camry   Avalon  Civic   Accord  Passat  Jetta

丰田1 1 0 0 0 0 本田0 0 1 1 0 0 大众0 0 0 0 1 1

我需要一些有关如何通过python解决此问题的帮助。

Answer 1

I wouldn't call what you want an adjacency matrix. However, the desired structure can be made quite easily (see comments in code):

import pandas
import StringIO
dataset = '''Toyota Camry
Toyota Avalon
Honda Civic
Honda Accord
Volkswagen Passat
Volkswagen Jetta'''
# read the dataset into DataFrame d
d = pandas.read_csv(StringIO.StringIO(dataset), ' ', header=None, names=(0, 1))
# make output DataFrame x with rows from first and columns from second input column
x = pandas.DataFrame(0, index=d[0].unique(), columns=d[1].unique())
# set the existing combinations to 1
for e in d.itertuples(index=False, name=None):
    x.at[e] = 1

使用python从文件创建邻接矩阵并将输出文件保存为.mat格式

1 个答案: