附加单元格后,在CSV文件中查找五个不同类别的最小值和最大值

时间:2013-03-11 04:57:26

标签: python csv max min

大家好,我有一个问题。我刚刚学习了min和max。

我很难找到每个类别的最小和最多五列

继承我所拥有的:

我将5列26列数据从csv文件移动到txt文件。

例如,

.csv的附加单元格就像

state          car      motorcycle   van        airplane       bike 
Maine          35.5      8.1         5.7         21.0%         33.2%
Michigan       47.9      9.1         5.5         20.40%        25.2%                   
Washington     52.5      1.2         4.6          3.50%        24.7%                     
Denver         21.8      20.5        5.3          2.90%        30.9%    

如何让min和max看起来像这样

                   min                       max
car             Denver:      21.8          Washington: 52.5   
motor           Washington:  1.2           Denver:     20.5 
van             Washington   4.6           Maine:       5.7 
airplane        Denver       2.90%         Maine       21.0% 
bike            Washington   24.7%         Maine      33.2%                    -

这就是我所拥有的

import csv
import string, re
import operator

output = []
data = []
csv_string = []
data_file = []

try:
    with open('data.csv', 'r') as csv_string:
         for line in csv_string:
             cells = line.split(",")
             output.append((cells[0], cells[1], cells[5], cells[7], cells[11], cells[13]))
                         for lines in output:


            #state = cells[0]

            zmin = cells[1]   #car = cells[1]
            ymin = cells[1]
            xmin = cells[5]   #motor = cells[5] 
            wmin = cells[5]
            vmin = cells[7]   #van = cells[7]
            zmax = cells[7]
            ymax = cells[11]  #airplane = cells[11]
            xmax = cells[11]
            wmax = cells[13]  #bike = cells[13] 
            vmax = cells[13]

        if cells[1] < xmin: 
            zmin = cells[1] 
        if cells[1] > xmax: 
            zmax = cells[1]


        if cells[5] < ymin: 
            ymin = cells[5]
        if cells[5] > ymax: 
            ymax = cells[5]

        if cells[7] < zmin:
            xmin = cells[7]
        if cells[7] > zmax: 
            xmax = cells[7] 

        if cells[11] < zmin:
            wmin = cells[11]
        if cells[11] > zmax: 
            wmax = cells[11]

        if cells[13] < zmin:
            vmin = cells[13]
        if cells[13] > zmax: 
            vmax = cells[13]

        outstring = ' '
    for item in output:
            for cell in item:
                outstring += "{0:<35}".format(cell) #Width/Distance of each row
            outstring += "\n"

    print(outstring)

    print('Min: ',zmin,ymin,xmin,wmin,vmin)

    print('Max: ',state,zmax,ymax,xmax,wmax,vmax) 


  finally:
          f.close()

try:    
    f_write = open('output.txt', 'w') #creates the file
    try:
        f_write.writelines(outstring)

  finally:
          f.close()

我不确定我做错了什么。我一直在阅读min和max,但我不明白这是如何在附加5列的.csv文件中应用的。

如果有人可以提供一些指导,感谢您的意见。

程序打印错误的数字

 print('Min: ',zmin,ymin,xmin,wmin,vmin)
      47.9,  8.1, 5.5, 20.40%, 25.2% 
 print('Max: ',state,zmax,ymax,xmax,wmax,vmax) 
      21.8, 9.1, 4.6, 20.40%, 30.9% 

2 个答案:

答案 0 :(得分:4)

使用pandas - 专为此类数据操作而设计的库,任务变得更加简单:

import pandas as pd

c = lambda x: float(x.strip('%'))
df = pd.read_csv(f,sep='\s+', converters = {'bike':c, 'airplane':c})

vehicles = df.columns[1:]  #['car', 'motorcycle', 'van', 'airplane', 'bike']

max_v = zip(df['state'][df[vehicles].idxmax().values], 
            df[vehicles].max().values.astype('|S4'))
min_v = zip(df['state'][df[vehicles].idxmin().values],
            df[vehicles].min().values.astype('|S4'))

max_i = [': '.join(tup) for tup in max_v]
min_i = [': '.join(tup) for tup in min_v]

print pd.DataFrame({'min':min_i, 'max':max_i}, index=vehicles)

出:

                         max               min
car         Washington: 52.5      Denver: 21.8
motorcycle      Denver: 20.5   Washington: 1.2
van               Maine: 5.7   Washington: 4.6
airplane         Maine: 21.0       Denver: 2.9
bike             Maine: 33.2  Washington: 24.7

答案 1 :(得分:2)

使用Python的内置csv模块可以完成所需的大部分工作。以下是如何查找数据字段(或列)列表的最小值和最大值。显示的样本data.csv文件的内容仅包含感兴趣的字段,但可以包含所有26列数据,而不会影响仅处理FIELDS列表中显示的字段的代码。< / p>

import csv

ID = 'state'
FIELDS = ['car', 'motorcycle', 'van', 'airplane', 'bike']
MIN_ID, MIN, MAX_ID, MAX = 0, 1, 2, 3  # indices of data in min_maxes records

with open('data.csv', 'rb') as csv_file:
    csv_dict_reader = csv.DictReader(csv_file, delimiter=',')

    # initialize min and max values from first row of csv file
    row = csv_dict_reader.next()
    min_maxes = {field: [row[ID], float(row[field])]*2 for field in FIELDS}

    # update min and max values with data from remaining rows of csv file
    for row in csv_dict_reader:
        for id, value, min_max_rec in (
                (row[ID], float(row[field]), min_maxes[field]) for field in FIELDS):
            if value < min_max_rec[MIN]:
                min_max_rec[MIN_ID] = id
                min_max_rec[MIN]    = value
            if value > min_max_rec[MAX]:
                min_max_rec[MAX_ID] = id
                min_max_rec[MAX]    = value

print '                   min                  max'
for field in FIELDS:
    min_max_rec = min_maxes[field]
    print '{:10}    {:12}{:4.1f}      {:12}{:4.1f}'.format(field,
              min_max_rec[MIN_ID]+':', min_max_rec[MIN],
              min_max_rec[MAX_ID]+':', min_max_rec[MAX])

输入(简化data.csv文件):

state,car,motorcycle,van,airplane,bike
Maine,35.5,8.1,5.7,21.0,33.2
Michigan,47.9,9.1,5.5,20.40,25.2
Washington,52.5,1.2,4.6,3.,24.7
Denver,21.8,20.5,5.3,2.90,30.9

输出:

                   min                  max
car           Denver:     21.8      Washington: 52.5
motorcycle    Washington:  1.2      Denver:     20.5
van           Washington:  4.6      Maine:       5.7
airplane      Denver:      2.9      Maine:      21.0
bike          Washington: 24.7      Maine:      33.2