Python - 如何使用CSV文件中的循环找到同一列的两行之间的差异?

时间:2017-10-26 23:13:53

标签: python python-3.x csv

 

 Date   Revenue
9-Jan   $943,690.00 
9-Feb   $1,062,565.00 
9-Mar   $210,079.00 
9-Apr   -$735,286.00
9-May   $842,933.00 
9-Jun   $358,691.00 
9-Jul   $914,953.00 
9-Aug   $723,427.00 
9-Sep   -$837,468.00
9-Oct   -$146,929.00
9-Nov   $831,730.00 
9-Dec   $917,752.00 
10-Jan  $800,038.00 
10-Feb  $1,117,103.00 
10-Mar  $181,220.00 
10-Apr  $120,968.00 
10-May  $844,012.00 
10-Jun  $307,468.00 
10-Jul  $502,341.00 

# This is what I did so far...

# Dependencies
import csv

# Files to load (Remember to change these)
file_to_load = "raw_data/budget_data_2.csv"


totalrev = 0
count = 0

# Read the csv and convert it into a list of dictionaries
with open(file_to_load) as revenue_data:
    reader = csv.reader(revenue_data)

    next(reader)  



    for row in reader:

        count += 1
        revenue = float(row[1])     
        totalrev += revenue

    for i in range(1,revenue):
         revenue_change = (revenue[i+1] - revenue[i])

avg_rev_change = sum(revenue_change)/count

print("avg rev change: ", avg_rev_change)         

print ("budget_data_1.csv")
print ("---------------------------------")
print ("Total Months: ", count)
print ("Total Revenue:", totalrev)





我在CSV文件中有以上数据。我在找到收入变化方面遇到了问题,即第1行的收入 - 第0行,第2行 - 第1行等等......最后,我想要总收入变化的总和。我试过循环,但我猜有一些愚蠢的错误。请建议我代码,以便我可以比较我的错误。我是python和编码的新手。

2 个答案:

答案 0 :(得分:2)

目前还不清楚您是否可以使用第三方软件包,例如大熊猫,但大熊猫在这些类型的操作中表现很好。我建议你使用它的功能,而不是逐行迭代。

dfpandas.DataFrame个对象。使用pandas.read_csv将数据加载到DataFrame中。

>>> df
      Date        Revenue
0    9-Jan    $943,690.00
1    9-Feb  $1,062,565.00
2    9-Mar    $210,079.00
3    9-Apr   -$735,286.00
4    9-May    $842,933.00
5    9-Jun    $358,691.00
6    9-Jul    $914,953.00
7    9-Aug    $723,427.00
8    9-Sep   -$837,468.00
9    9-Oct   -$146,929.00
10   9-Nov    $831,730.00
11   9-Dec    $917,752.00
12  10-Jan    $800,038.00
13  10-Feb  $1,117,103.00
14  10-Mar    $181,220.00
15  10-Apr    $120,968.00
16  10-May    $844,012.00
17  10-Jun    $307,468.00
18  10-Jul    $502,341.00

# Remove the dollar sign and any other weird chars
>>> df['Revenue'] = [float(''.join(c for c in row if c in '.1234567890')) for row in df['Revenue']]

使用pandas.Series.shift将上个月的值与当前月份的值对齐,然后减去这两个值:

>>> df['Diff'] = df['Revenue'] - df['Revenue'].shift(1)
>>> df
      Date    Revenue      Diff
0    9-Jan   943690.0       NaN
1    9-Feb  1062565.0  118875.0
2    9-Mar   210079.0 -852486.0
3    9-Apr   735286.0  525207.0
4    9-May   842933.0  107647.0
5    9-Jun   358691.0 -484242.0
6    9-Jul   914953.0  556262.0
7    9-Aug   723427.0 -191526.0
8    9-Sep   837468.0  114041.0
9    9-Oct   146929.0 -690539.0
10   9-Nov   831730.0  684801.0
11   9-Dec   917752.0   86022.0
12  10-Jan   800038.0 -117714.0
13  10-Feb  1117103.0  317065.0
14  10-Mar   181220.0 -935883.0
15  10-Apr   120968.0  -60252.0
16  10-May   844012.0  723044.0
17  10-Jun   307468.0 -536544.0
18  10-Jul   502341.0  194873.0

答案 1 :(得分:0)

import csv

# Files to load (Remember to change these)
file_to_load = "raw_data/budget_data_2.csv"


# Read the csv and convert it into a list of dictionaries
with open(file_to_load) as revenue_data:
    reader = csv.reader(revenue_data)

    # use of next to skip first title row in csv file
    next(reader) 
    revenue = []
    date = []
    rev_change = []

    # in this loop I did sum of column 1 which is revenue in csv file and counted total months which is column 0 
    for row in reader:

        revenue.append(float(row[1]))
        date.append(row[0])

    print("Financial Analysis")
    print("-----------------------------------")
    print("Total Months:", len(date))
    print("Total Revenue: $", sum(revenue))


    #in this loop I did total of difference between all row of column "Revenue" and found total revnue change. Also found out max revenue change and min revenue change. 
    for i in range(1,len(revenue)):
        rev_change.append(revenue[i] - revenue[i-1])   
        avg_rev_change = sum(rev_change)/len(rev_change)

        max_rev_change = max(rev_change)

        min_rev_change = min(rev_change)

        max_rev_change_date = str(date[rev_change.index(max(rev_change))])
        min_rev_change_date = str(date[rev_change.index(min(rev_change))])


    print("Avereage Revenue Change: $", round(avg_rev_change))
    print("Greatest Increase in Revenue:", max_rev_change_date,"($", max_rev_change,")")
    print("Greatest Decrease in Revenue:", min_rev_change_date,"($", min_rev_change,")")


我得到的输出


Financial Analysis
-----------------------------------
Total Months: 86
Total Revenue: $ 36973911.0
Avereage Revenue Change: $ -5955
Greatest Increase in Revenue: Jun-2014 ($ 1645140.0 )
Greatest Decrease in Revenue: May-2014 ($ -1947745.0 )