使用Python重新格式化CSV并将数据分组

时间:2012-09-19 06:16:27

标签: python csv

我知道必须有数十种“正确”方法才能做到这一点,但我想了解你最好的做法/最聪明的做法是什么。

我的CSV看起来像这样:

Date,Num,name,Aging,Open Balance
07/16/2012,12-001270,8,1,"-8,934.75"
07/18/2012,12-2429,24,34,2.00
07/18/2012,12-2428,24,58,85.00
07/18/2012,12-2420,8,58,"4,381.90"

我需要它看起来更像这种格式:

name,num,date,0-30,31-60,61-90,91+,total
8,12-001270,7/16/2012,"-8,934.75",0,0,0,"-8,934.75"
8,12-2420,07/18/2012,0,"4,381.90",0,0,"4,381.90"
24,12-2428,07/18/2012,0,2,85,87

问题是:python中是否有即插即用的解决方案,可以通过这种方式存储数据?

我将获取Aging列数据并分解到不同的范围并重新格式化数据,如图所示。

最有效的方法是什么?

1 个答案:

答案 0 :(得分:3)

一些让你入门的东西......

阅读文件并访问值

with open('somefile.csv') as fin:
    csvin = csv.DictReader(fin)
    for row in csvin:
        print 'Person {name} had a balance of {Open Balance}'.format(**row)

格式化您的金额以使其可用

import re

s = "-8,934.75"
try:
    amount = float(re.sub('[^-.0-9]', '', s))
except ValueError as e:
    pass # wasn't valid for some reason? do something sensible

'暂停'数据

from bisect import bisect

def age_band(age, upto=[30, 60, 90], desc=['0-30', '31-60', '61-90', '91+']):
    if not age >= 0:
        return '*invalid*'
    return desc[bisect(upto, age)]

for age in [31, 99, 65, 12, -1]:
    print age, age_band(age)