使用python从一列但是从某些行添加csv文件值

时间:2018-03-05 03:06:43

标签: python csv

我有一个看起来像这样的住房csv:

year,title_field,value
2014,Total Housing Units,49109
2014,Vacant Housing Units,2814
2014,Occupied Housing Units,46295
2013,Total Housing Units,47888
2013,Vacant Housing Units,4215
2013,Occupied Housing Units,43673
2012,Total Housing Units,45121
2012,Vacant Housing Units,3013
2012,Occupied Housing Units,42108
2011,Total Housing Units,44917
2011,Vacant Housing Units,4213
2011,Occupied Housing Units,40704
2010,Total Housing Units,44642
2010,Vacant Housing Units,3635
2010,Occupied Housing Units,41007
2009,Total Housing Units,39499
2009,Vacant Housing Units,3583
2009,Occupied Housing Units,35916
2008,Total Housing Units,41194
2008,Vacant Housing Units,4483
2008,Occupied Housing Units,36711

我需要使用Python来获取2009年的占用住房单元数量,并从2008年的占用住房单元数量中减去(依此类推,直到2014年)并按升序返回值。

我在一个没有教过这课的课堂上,但这是预料之中的,而且我在思考如何磨练特定的"细胞方面遇到了麻烦#&# 34;使用它。

这就是我所拥有的一切。它将每一行作为一个列表返回,这很不错,但我已经失去了它。

with open('housing.csv', newline='') as File:
    reader = csv.reader(File)
    for row in reader:
        print(row)

2 个答案:

答案 0 :(得分:0)

您可以使用pandas

import pandas as pd

df = pd.read_csv("housing.csv")
# get occupied housing rows
df = df[df["title_field"] == "Occupied Housing Units"]
# pandas function which calculates rows difference
df["Diff"] = df.sort_values("years")["values"].diff()
# sort by value
df = df.sort_values("Diff", ascending=False)

答案 1 :(得分:0)

您可以使用 import numpy as np x = np.linspace(-2, 2, 100) #print("\nx values:\n ", x) #print len(x) f2y = np.zeros(len(x)) #print("f2y, empty array of y values: ", f2y) #print len(f2y) f2y = [i for i in x] print f2y 执行此任务:

pandas

如果您希望df[df['title_field'] == 'Occupied Housing Units'].groupby(by= \ ['year','title_field']).sum().diff(-1).sort_values('value', ascending=False) value year title_field 2008 Occupied Housing Units 795.0 2010 Occupied Housing Units 303.0 2011 Occupied Housing Units -1404.0 2012 Occupied Housing Units -1565.0 2013 Occupied Housing Units -2622.0 2009 Occupied Housing Units -5091.0 2014 Occupied Housing Units NaN 值变为:

absolute

更改第一列的列名并不是那么直截了当,老实说,这本身就是另一个问题。至于如何告诉它减去df[df['title_field'] == 'Occupied Housing Units'].groupby(by= ['year','title_field']).sum().diff(-1).abs().sort_values('value', ascending=False) 的作用。