Question

我是python的新手，我想对一堆vectors（vectors.csv）运行成对相似性算法。每个向量代表一个节点。我有包含以下内容的vectors.csv文件：

    1,2,3
    4,5,6
    7,8,9

，我有一个列表，其中y = [56,76,87]表示节点。

我想获取一个包含以下内容的.csv文件：

    null,56,76,87
    56,1,2,3
    76,4,5,6
    87,7,8,9

在python3中执行此操作的最佳方法是什么？

csv中的矩阵是一个numpy数组。

任何帮助将不胜感激。

谢谢！

Answer 1

pandas可能会对您有所帮助。

import pandas as pd
y = [56,76,87]
c=pd.read_csv("vector.csv", names=y)
c.index=y

这将为您提供：

    56 76 87
56  1  2  3
76  4  5  6
87  7  8  9

最后您导出了新生成的数据

c.to_csv('new_file.csv')

Answer 2

定义数组和标签列表：

In [67]: arr = np.arange(1,10).reshape(3,3)
In [68]: y = [56,76,87]

将标签列表加入数组：

In [69]: arr1 = np.column_stack((y,arr))

定义标题行：

In [70]: header = 'null,' + ','.join([str(i) for i in y])
In [71]: header
Out[71]: 'null,56,76,87'

用savetxt书写。注意标题，注释和fmt参数的使用。如有需要，与这些人一起玩：

In [72]: np.savetxt('test.txt', arr1,header=header, fmt='%d',delimiter=',',comments='')
In [73]: cat test.txt
null,56,76,87
56,1,2,3
76,4,5,6
87,7,8,9

savetxt用注释字符写标题。然后遍历数组数组的行（第一个暗）。对于每一行，都会进行fmt%tuple(row)写操作，其中fmt是从您的参数派生的。因此，其核心是格式化行的标准Python文件写入。

Answer 3

让我对此有所了解。

“ csv中的矩阵是一个numpy数组。”

不一定。如果您的文件是.csv文件，则可以使用csv包并导入数据，如下所示：

import os
import csv

root = r'C:\path\to\my\csv\file'
input_file_name = r'input_data.csv'
output_file_name = r'new_data.csv'

input_path = os.path.join(root, input_file_name)
output_path = os.path.join(root, output_file_name)

导入我们的数据：

with open(input_path, 'r', newline ='') as f:
    csv_reader = csv.reader(f, delimiter=',')
    data = [i for i in csv_reader]
f.close()

然后您将获得一个列表列表（就像一个数组，但是在Python中是列表数据类型）：

[[' 1', '2', '3'], [' 4', '5', '6'], [' 7', '8', '9']]

这是我们的y值，我假设它们是整数：

y = [56,76,87]

我从这里借来了一个有用的功能： Converting elements of list of nested lists from string to integer in python

def int_conversion(my_list):
    return [int(x) if not isinstance(x, list) else int_conversion(x) for x in my_list]

我们的函数进行了一些数据类型转换，但是输出整数值：

def process_data(my_data=data):
    # copy the raw data list
    new_data = my_data

    # Convert our y values to stings for processing
    y_1 = [str(i) for i in y]

    # Insert each value of our y list at the first spot in each sublist
    for i in range(len(my_data)):
        new_data[i].insert(0, y_1[i])

    # Insert a '0' placeholder at the start of our y list
    y_1.insert(0, '0')

    # Insert the y list as a sublist in our main data list
    new_data.insert(0, y_1)

    # Convert the list values to integers
    new_data = int_conversion(new_data)

    # Replace the first value in the first sublist with a null (None) value
    new_data[0][0] = None

    # Return the results
    return new_data

处理然后写输出：

data = process_data()

with open(output_path, mode='w', newline='') as xyz:
    writer = csv.writer(xyz)
    writer.writerows(data)

然后您的文件应如下所示：

,56,76,87
56,1,2,3
76,4,5,6
87,7,8,9

Answer 4

由于从概念上讲，第一行和第一列表示标签，因此您不妨考虑以NumPy数组对象为基础的Pandas：

import pandas as pd
from io import StringIO

x = """1,2,3
4,5,6
7,8,9"""

# read data; replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), header=None)

# define column and index properties
idx = [56,76,87]
df.columns = idx
df.index = idx

# export to csv
df.to_csv('out.csv')

将列表中的行和列添加到python中的csv

4 个答案: