Question

我有一个excel文档，我已导出为CSV。它看起来像这样：

"First Name","Last Name","First Name","Last Name","Address","City","State"
"Bob","Robertson","Roberta","Robertson","123 South Street","Salt Lake City","UT"
"Leo","Smart","Carter","Smart","827 Cherry Street","Macon","GA"
"Mats","Lindgren","Lucas","Lindgren","237 strawberry xing","houston","tx"

我有一个名为＆＃34;类别＆＃34;有一个名称变量。我的代码为每个第一行字符串创建了一个类别，但现在我需要将每个项目添加到它应该进入的列中。

import xlutils
from difflib import SequenceMatcher
from address import AddressParser, Address
from nameparser import HumanName
import xlrd
import csv

class Category:
    name = ""
    contents = []
    index = 0

columns = []
alltext = ""

with open('test.csv', 'rb') as csvfile:
    document = csv.reader(csvfile, delimiter=',', quotechar='\"')
    for row in document:
        alltext = alltext + ', '.join(row) + "\n"

    splitText = alltext.split('\n')


    categoryNames = splitText[0].split(', ')
    ixt = 0
    for name in categoryNames:
        thisCategory = Category()
        thisCategory.name = name
        thisCategory.index = ixt
        columns.append(thisCategory)
        ixt = ixt + 1


    for line in splitText:
        if(line != splitText[0] and len(line) != 0):
            individualItems = line.split(', ')
            for index, item in enumerate(individualItems):
                if(columns[index].index == index):
                    print(item + " (" + str(index) + ") is being sent to " + columns[index].name)
                    columns[index].contents.append(item)
    for col in columns:
        print("-----" + col.name + " (" + str(col.index) + ")-----")
        for stuff in col.contents:
            print(stuff)

当代码运行时，它为每个项目提供输出：

Bob (0) is being sent to First Name
Robertson(1) is being sent to Last Name

它应该做什么。每个项目都说它被发送到正确的类别。然而，最后，不是每个项目都在其声明的类别中，而是每个项目都包含每个项目，而不是：

-----First Name-----
Bob
Roberta
Leo
Carter
Mats
Lucas

依此类推，对于每个类别。我明白了：

-----First Name-----
Bob
Robertson
Roberta
Robertson
123 South Street
Salt Lake City
UT
Leo
Smart
Carter
Smart
827 Cherry Street
Macon
GA
Mats
Lindgren
Lucas
Lindgren
237 strawberry xing
houston
tx

我不知道出了什么问题。这两行代码之间没有任何内容可能会弄乱它。

Answer 1

问题在于您为Category定义了类级别变量，而不是实例变量。

对此几乎无害

thisCategory.name = name
thisCategory.index = ixt

因为为每个对象创建了掩盖类变量的实例变量。但

columns[index].contents.append(item)

是不同的。它获得了单个级别contents列表并添加了数据，无论当时哪个实例都处于活动状态。

解决方案是使用__init__中创建的实例变量。此外，你做了太多的工作，将事情重新组合成字符串，然后再将它们分解出来。只需在读取行时处理列。

#import xlutils
#from difflib import SequenceMatcher
#from address import AddressParser, Address
#from nameparser import HumanName
#import xlrd
import csv

class Category:

    def __init__(self, index, name):
        self.name = name
        self.index = index
        self.contents = []

columns = []
alltext = ""

with open('test.csv', 'r', newline='') as csvfile:
    document = csv.reader(csvfile, delimiter=',', quotechar='\"')
    # create categories from first row
    columns = [Category(index, name) 
        for index, name in enumerate(next(document))]
    # add columns for the rest of the file
    for row in document:
        if row:
            for index, cell in enumerate(row):
                columns[index].contents.append(cell)

for col in columns:
    print("-----" + col.name + " (" + str(col.index) + ")-----")
    for stuff in col.contents:
        print(stuff)

Answer 2

3条评论：

你没有考虑第一个字段 - 你取一个空字符串alltext = ""，你要做的第一件事就是添加一个逗号。这推动了一个领域的一切。你需要测试你是否在第一行。
您正在打开一个csv ...然后将其扭转回文本文件。这看起来像是因为csv将字段分隔值，并且您希望稍后手动执行此操作。如果您首先将文件作为文本文件打开并使用read阅读，则不需要代码的第一部分（除非您对csv做了一些非常奇怪的事情;因为我们不知道有一个样本要检查我不能评论那个）。
```
with open('test.csv', 'r') as f:
    document = f.read()
```

会为您提供格式正确的alltext字符串。

这是csv.DictReader的一个很好的用例，它将以结构化格式为您提供字段。请参阅this StackOverflow question作为示例和the documentation。

Answer 3

尝试使用以下语句阅读csv。

import csv
data = []
with open("test.csv") as f :
    document = csv.reader(f)
    for line in document :
        data.append(line)

其中data [0]将具有所有类别名称

我正在尝试为CSV文件的每一列创建一个数组，

3 个答案: