如何使用for循环和Python中的csv库迭代列?

时间:2015-03-27 23:12:09

标签: python csv

我是一个非常新手的Python用户,试图在.csv文件中对数据列进行求和。我找到了其他真正帮助我开始的答案(例如herehere)。

然而,我的问题是我想循环我的文件以获取所有列的总和。

我的格式化数据如下所示:

    z   y   x   w   v   u
a   0   8   7   6   0   5
b   0   0   5   4   0   3
c   0   2   3   4   0   3
d   0   6   7   8   0   9

或者像.csv格式一样:

,z,y,x,w,v,u
a,0,8,7,6,0,5
b,0,0,5,4,0,3
c,0,2,3,4,0,3
d,0,6,7,8,0,9

目前,我只是想让迭代工作。我会担心以后的总结。这是我的代码:

import csv
data = file("test.csv", "r")
headerrow = data.next()
headerrow = headerrow.strip().split(",")
end = len(headerrow)
for i in range (1, end):
    for row in csv.reader(data):
        print row[i]

这是我得到的:

>>> 
0
0
0
0
>>> 

因此,它会在每行的索引1处打印值,但不会继续通过其他索引。

我在这里错过了什么明显的事情?

更新:

根据非常有用的建议和解释,我现在有了这个:

import csv
with open("test.csv") as data:
    headerrow = next(data)
    delim = "," if "," == headerrow[0] else " "
    headerrow = filter(None, headerrow.rstrip().split(delim))
    reader = csv.reader(data, delimiter=delim, skipinitialspace=True)
    zipped = zip(*reader)
    print zipped
    strings = next(zipped)
    print ([sum(map(int,col)) for col in zipped])

这会返回错误:

Traceback (most recent call last):
  File "C:\Users\the hexarch\Desktop\remove_total_absences_test.py", line 9,     in <module>
    strings = next(zipped)
TypeError: list object is not an iterator

我不明白这个......?遗憾!

3 个答案:

答案 0 :(得分:5)

import csv
with  open('in.csv')as f:
    head = next(f)
    # decide delimiter by what is in header 
    delim = "," if "," ==  head[0] else " "
    # need to filter empty strings 
    head = filter(None, head.rstrip().split(delim))
    # skipinitialspace must be set as you have two spaces delimited
    reader = csv.reader(f,delimiter=delim, skipinitialspace=True)
    # transpose rows
    zipped = zip(*reader)
    # skip first column
    strings = next(zipped)
    # sum each column
    print([sum(map(int,col)) for col in zipped])

[0, 16, 22, 22, 0, 20]

要创建匹配标题的字母到字母总和,您可以这样:

print(dict(zip(list(head), (sum(map(int,col)) for col in zipped))))

输出:

{'u': 20, 'w': 22, 'x': 22, 'z': 0, 'y': 16, 'v': 0}

如果您使用python2替换为:

,我使用python3进行上述所有操作
zip -> itertools.izip
filter -> itertools.izip
map -> itertools.imap

Python 2代码:

import csv
from itertools import izip, imap, ifilter
with  open('in.csv')as f:
    head = next(f)
    # decide delimiter by what is in header
    delim = "," if "," ==  head[0] else " "
    # need to filter empty strings
    head = ifilter(None, head.rstrip().split(delim))
    # skipinitialspace must be set as you have two spaces delimited
    reader = csv.reader(f,delimiter=delim, skipinitialspace=True)
    # transpose rows
    zipped = izip(*reader)
    # skip first column
    strings = next(zipped)
    # sum each column
    print([sum(imap(int,col)) for col in zipped])

输出:

[0, 16, 22, 22, 0, 20]

如果你做了很多这样的工作,那么大熊猫尤其是pandas.read_csv可能会有用,下面是一个非常基本的例子,一些熊猫大师可能希望增加它:

import  pandas as pd

df = pd.read_csv("in.csv")
print(df.sum())
Unnamed: 0    abcd
z                0
y               16
x               22
w               22
v                0
u               20
dtype: object

答案 1 :(得分:3)

您可以使用numpy

import csv
import numpy as np
with open("test.csv") as f:
    r = csv.reader(f, delimiter=",")
    # For space format: r = csv.reader(f, delimiter=" ", skipinitialspace=True)
    # Thanks to Padraic Cunningham ^^
    next(r) # Skip header row
    sums = sum((np.array(map(int, row[1:])) for row in r))

结果:

>>> sums
array([ 0, 16, 22, 22,  0, 20])

答案 2 :(得分:2)

这可能会澄清一些究竟发生了什么......看起来你似乎略微过于复杂化。这是非常简单的Python,并非旨在成为您问题的直接或最终解决方案,但更有助于了解正在发生的事情。

import csv 

sumthree = 0

with open('test.csv', 'rb') as f:    # Open the file (always use binary 'rb' mode for CSV files)
    header = next(f)        # Extract the header line from the file
    csvr = csv.reader(f)    # Create a CSV object with the rest of the file
    for row in csvr:
        print row           # Now loop over the file and print each row

        sumthree += int(row[2])

    print sumthree

此时,每个row将打印为列表,例如['a','0','8','7','6','0','5']

因此,通过该循环的每次迭代,我们都会逐行向下移动。 row[0]将成为第一列,row[1]将成为第二列,等等。如果要对文件的第3列求和,可以使用sumthree += int(row[2])。在这结束时我们print sumthree并查看第3列中所有数字的总和。