csv表数据处理问题

时间:2017-04-20 20:37:50

标签: python excel

我的桌子看起来像这样

enter image description here

我需要为每个县添加一个最大百分比分数的字段。例如,如果99.03833,那么安德森县的最高分是HAZ_7。第一行代表得分。每行的数字代表得分的百分比。我需要每个县的多数得分。

任何人都可以知道如何在excel或python中执行此操作吗?

3 个答案:

答案 0 :(得分:1)

列名称的Excel解决方案:

=INDEX(C$1:L$1,MATCH(MAX(C2:L2),C2:L2,0))

Excel价值解决方案:

=MAX(B2:L2)

答案 1 :(得分:0)

我将假设这是一个名为df的pandas DataFrame。如果是这种情况,下面的python将向您的DataFrame添加一个名为max的列,其中包含每行的最大值。

df['max'] = df.loc[:,'%HAZ_1':].max(axis=1)

答案 2 :(得分:0)

以下是如何在Python中完成的。

import csv

filename = 'county_data.csv'
output_filename = 'county_data2.csv'

def maxelements(names, seq):
    """ Return corresponding names of the position(s) of the largest element in sequence. """
    max_value = max(seq)
    return [names[i] for i, v in enumerate(seq) if v == max_value]

with open(filename, 'r') as infile, open(output_filename, 'w') as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile)
    fieldnames = next(reader)  # assume first row contains field names
    writer.writerow(fieldnames + ['Max'])  # plus name of new field
    haz_fields = fieldnames[2:]
    for row in reader:
        row = row[:2] + [float(elem) for elem in row[2:]]  # convert haz fields to numbers
        maxfields = maxelements(haz_fields, row[2:])
        writer.writerow(row + maxfields)

这是一个小样本输入cvs文件:

County,FIPS,%HAZ_1,%HAZ_2,%HAZ_3,%HAZ_4,%HAZ_5,%HAZ_6,%HAZ_7,%HAZ_8,%HAZ_9,%HAZ_10
Anderson County,48001,0,0,0,0,0,0,99.03833,0.961668,0,0
Andrews County,48003,0,0,0,0,0,0,26.08,73.92,0,0
Angelina County,48005,0,0,0,0,0,62.41924,37.58076,0,0,0
Aransas County,48007,0,0,100,0,0,0,0,0,0,0

以下是写入输出文件的内容:

County,FIPS,%HAZ_1,%HAZ_2,%HAZ_3,%HAZ_4,%HAZ_5,%HAZ_6,%HAZ_7,%HAZ_8,%HAZ_9,%HAZ_10,Max
Anderson County,48001,0.0,0.0,0.0,0.0,0.0,0.0,99.03833,0.961668,0.0,0.0,%HAZ_7
Andrews County,48003,0.0,0.0,0.0,0.0,0.0,0.0,26.08,73.92,0.0,0.0,%HAZ_8
Angelina County,48005,0.0,0.0,0.0,0.0,0.0,62.41924,37.58076,0.0,0.0,0.0,%HAZ_6
Aransas County,48007,0.0,0.0,100.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,%HAZ_3

注意: maxelements()函数会返回一个列表,因为它可能存在两个或多个%HAZ#个字段,其中包含相同的最大值(虽然在样本输入中没有发生这种情况)。代码不一定能正确处理这种情况,主要是因为你还没有描述在这种情况下你想要发生什么。

这不是一个问题,你可以使用它的以下版本 - 基本上是一个单行 - 只返回第一个的索引:

def maxelements(names, seq):
    """ Return corresponding names of the position(s) of the largest element in sequence. """
    return [names[seq.index(max(seq))]]