读取csv文件并执行具有某些行的公式

时间:2018-03-06 01:27:16

标签: python csv

昨晚我问了一个类似的问题,但是我的教授对她如何回答这个问题做了一些澄清,并且让我开了一个循环。

我有一个包含3列的csv文件。我将它们保存为字典,但我试图找到一种方法来阅读yeartitle_field并找到一个特定的title_field(占用住房单位) ),将其与最早的年份(2008年)进行匹配,并将value列中的数字与其旁边的数字相匹配,并将其与下一个year(2009)相匹配,并使用相同的title_field(占用房屋单位),找出这两个值之间的差异,并打印结果并对2009年和2010年,等等:

2008-2009  795
2009-2010  5091
etc.

csv看起来像这样:

year,title_field,value
2014,Total Housing Units,49109
2014,Vacant Housing Units,2814
2014,Occupied Housing Units,46295
2013,Total Housing Units,47888
2013,Vacant Housing Units,4215
2013,Occupied Housing Units,43673
2012,Total Housing Units,45121
2012,Vacant Housing Units,3013
2012,Occupied Housing Units,42108
2011,Total Housing Units,44917
2011,Vacant Housing Units,4213
2011,Occupied Housing Units,40704
2010,Total Housing Units,44642
2010,Vacant Housing Units,3635
2010,Occupied Housing Units,41007
2009,Total Housing Units,39499
2009,Vacant Housing Units,3583
2009,Occupied Housing Units,35916
2008,Total Housing Units,41194
2008,Vacant Housing Units,4483
2008,Occupied Housing Units,36711

到目前为止我的代码是:

import csv
def process(year, field_name, value):
    print(year, field_name, value)

with open('denton_housing.csv', 'r', encoding='utf8',newline='') as f:
    reader = csv.DictReader(f, delimiter=',')
    housing_stats = []
    for row in reader:
        year = row["year"]
        field_name = row["title_field"]
        value = int(row["value"])
        denton_dict = {"year": year, "field_name": field_name, "value": value}
        housing_stats.append(denton_dict)
        process(year, field_name, value)

谢谢!我是编程的新手,而且我是一个年长的老兄。我喜欢编程社区的帮助太大了,好像你们都欢迎所有人进入邪教组织(好的?)。

3 个答案:

答案 0 :(得分:1)

你可以这样做:

  1. 创建listdicts行,其中包含title_field目标值。
  2. 按每个年份的年份值对其进行排序。
  3. 使用itertools生成器的list配方处理已排序的import csv from itertools import tee # From https://docs.python.org/3/library/itertools.html#recipes def pairwise(iterable): "s -> (s0,s1), (s1,s2), (s2, s3), ..." a, b = tee(iterable) next(b, None) return zip(a, b) target_title_field = 'Occupied Housing Units' csv_filename = 'denton_housing.csv' with open(csv_filename, 'r', encoding='utf8', newline='') as f: housing_stats = [] for row in csv.DictReader(f, delimiter=','): if row['title_field'] == target_title_field: year = int(row["year"]) field_name = row["title_field"] value = int(row["value"]) denton_dict = {"year": year, "field_name": field_name, "value": value} housing_stats.append(denton_dict) housing_stats.sort(key=lambda row: row['year']) for r1, r2 in pairwise(housing_stats): print('{}-{} {:5}'.format(r1['year'], r2['year'], abs(r2['value'] - r1['value']))) 中的每对行/年。
  4. 实施上述代码:

    2008-2009   795
    2009-2010  5091
    2010-2011   303
    2011-2012  1404
    2012-2013  1565
    2013-2014  2622
    

    输出:

    public extension NSAttributedString {
        public func width(height: CGFloat) -> CGFloat {
            let constraintRect = CGSize(width: .greatestFiniteMagnitude, height: height)
            let boundingBox = self.boundingRect(with: constraintRect,
                                                options: [.usesLineFragmentOrigin, .usesFontLeading],
                                                context: nil)
            return ceil(boundingBox.height)
        }
    }
    

答案 1 :(得分:0)

我建议您使用熊猫来做这件事。 然后你可以轻而易举地使用groupby和聚合。

像这样:

df.groupby(df['year'].dt.year)['a'].agg(['value'])

结果:

2012   14   
2015    6

答案 2 :(得分:0)

一种简单的方法是使用3个列表(每个title_field)来保存年份和值字段,然后您可以处理每个列表。

total = []
vacant = []
occupied = []

with open('denton_housing.csv', 'r', encoding='utf8',newline='') as f:
    spamreader = csv.reader(f, delimiter=',')
    for row in spamreader:
        if row[1] == 'Occupied Housing Units':
            # use the data structure you preferred, in this example I use tuple
            mytuple = (row[0], row[2])
            occupied.append(mytuple)
        # do the same for total and vacant list, ignore if you don't need
        ...

# then you can process the list, for example, occupied
# I assume your csv file is sorted by year, so you may safely assume that each 
# year field of the data entry in the occupied list is sorted as well
for i in range(len(occupied)-1):
    # if your data contains every year, ie 2008-2014 without missing any
    # the year field is useless in this case, so you can just
    value_diff = abs(occupied[i][1] - occupied[i+1][1])

# if the year entry is not sorted, and it may missed some years
occupied.sort(key=lambda x: x[0])    # this sort in ascending order
for i in range(len(occupied)-1):
    this_year = occupied[i][0]
    next_year = occupied[i+1][0]
    if next_year - this_year == 1:
        value_diff = abs(occupied[i][1] - occupied[i+1][1])