我正在尝试为CSV文件的每一列创建一个数组,

时间:2017-04-04 23:49:15

标签: python excel csv

我有一个excel文档,我已导出为CSV。它看起来像这样:

"First Name","Last Name","First Name","Last Name","Address","City","State"
"Bob","Robertson","Roberta","Robertson","123 South Street","Salt Lake City","UT"
"Leo","Smart","Carter","Smart","827 Cherry Street","Macon","GA"
"Mats","Lindgren","Lucas","Lindgren","237 strawberry xing","houston","tx"

我有一个名为"类别"有一个名称变量。我的代码为每个第一行字符串创建了一个类别,但现在我需要将每个项目添加到它应该进入的列中。

import xlutils
from difflib import SequenceMatcher
from address import AddressParser, Address
from nameparser import HumanName
import xlrd
import csv

class Category:
    name = ""
    contents = []
    index = 0

columns = []
alltext = ""

with open('test.csv', 'rb') as csvfile:
    document = csv.reader(csvfile, delimiter=',', quotechar='\"')
    for row in document:
        alltext = alltext + ', '.join(row) + "\n"

    splitText = alltext.split('\n')


    categoryNames = splitText[0].split(', ')
    ixt = 0
    for name in categoryNames:
        thisCategory = Category()
        thisCategory.name = name
        thisCategory.index = ixt
        columns.append(thisCategory)
        ixt = ixt + 1


    for line in splitText:
        if(line != splitText[0] and len(line) != 0):
            individualItems = line.split(', ')
            for index, item in enumerate(individualItems):
                if(columns[index].index == index):
                    print(item + " (" + str(index) + ") is being sent to " + columns[index].name)
                    columns[index].contents.append(item)
    for col in columns:
        print("-----" + col.name + " (" + str(col.index) + ")-----")
        for stuff in col.contents:
            print(stuff)

当代码运行时,它为每个项目提供输出:

Bob (0) is being sent to First Name
Robertson(1) is being sent to Last Name

它应该做什么。每个项目都说它被发送到正确的类别。然而,最后,不是每个项目都在其声明的类别中,而是每个项目都包含每个项目,而不是:

-----First Name-----
Bob
Roberta
Leo
Carter
Mats
Lucas

依此类推,对于每个类别。我明白了:

-----First Name-----
Bob
Robertson
Roberta
Robertson
123 South Street
Salt Lake City
UT
Leo
Smart
Carter
Smart
827 Cherry Street
Macon
GA
Mats
Lindgren
Lucas
Lindgren
237 strawberry xing
houston
tx

我不知道出了什么问题。这两行代码之间没有任何内容可能会弄乱它。

3 个答案:

答案 0 :(得分:1)

问题在于您为Category定义了类级别变量,而不是实例变量。

对此几乎无害
thisCategory.name = name
thisCategory.index = ixt

因为为每个对象创建了掩盖类变量的实例变量。但

columns[index].contents.append(item)

是不同的。它获得了单个级别contents列表并添加了数据,无论当时哪个实例都处于活动状态。

解决方案是使用__init__中创建的实例变量。此外,你做了太多的工作,将事情重新组合成字符串,然后再将它们分解出来。只需在读取行时处理列。

#import xlutils
#from difflib import SequenceMatcher
#from address import AddressParser, Address
#from nameparser import HumanName
#import xlrd
import csv

class Category:

    def __init__(self, index, name):
        self.name = name
        self.index = index
        self.contents = []

columns = []
alltext = ""

with open('test.csv', 'r', newline='') as csvfile:
    document = csv.reader(csvfile, delimiter=',', quotechar='\"')
    # create categories from first row
    columns = [Category(index, name) 
        for index, name in enumerate(next(document))]
    # add columns for the rest of the file
    for row in document:
        if row:
            for index, cell in enumerate(row):
                columns[index].contents.append(cell)

for col in columns:
    print("-----" + col.name + " (" + str(col.index) + ")-----")
    for stuff in col.contents:
        print(stuff)

答案 1 :(得分:0)

3条评论:

  1. 你没有考虑第一个字段 - 你取一个空字符串alltext = "",你要做的第一件事就是添加一个逗号。这推动了一个领域的一切。你需要测试你是否在第一行。
  2. 您正在打开一个csv ...然后将其扭转回文本文件。这看起来像是因为csv将字段分隔值,并且您希望稍后手动执行此操作。如果您首先将文件作为文本文件打开并使用read阅读,则不需要代码的第一部分(除非您对csv做了一些非常奇怪的事情;因为我们不知道有一个样本要检查我不能评论那个)。

    with open('test.csv', 'r') as f:
        document = f.read()
    
  3. 会为您提供格式正确的alltext字符串。

    1. 这是csv.DictReader的一个很好的用例,它将以结构化格式为您提供字段。请参阅this StackOverflow question作为示例和the documentation

答案 2 :(得分:0)

尝试使用以下语句阅读csv。

import csv
data = []
with open("test.csv") as f :
    document = csv.reader(f)
    for line in document :
        data.append(line)

其中data [0]将具有所有类别名称