在python中找到csv文件中的min,max

时间:2014-10-07 13:58:13

标签: python sorting csv

我试图找到" Measured_Power"的最小值,最大值,平均值。对于所有可能的组合率。我有很多费率和频率(10个频率,10个频率)。我的csv文件看起来像:

Channel, Rate, Length, Frequency, Expected_Power, Measured_Power, Expected_Eq, Measured_Eq, 
A, 27, 1000, 100, 20, 20.16, <-23.0, -27.33,
A, 6, 1000, 100, 20, 20.12, <-23.0, -25.96,
A, 3, 1000, 100, 20, 20.05, <-23.0, -26.34,
A, 27, 1000, 101, 20, 20.11, <-23.0, -24.88,
A, 6, 1000, 101, 20, 20.26, <-23.0, -25.55,
A, 3, 1000, 101, 20, 20.08, <-23.0, -25.42,
B, 27, 1000, 100, 20, 20.5, <-23.0, -26.98,
B, 6, 1000, 100, 20, 20.21, <-23.0, -24.61,
B, 3, 1000, 100, 20, 20.17, <-23.0, -23.54,
...

我试过了:

import numpy

file = r'C:\data.csv'
c = numpy.genfromtxt(file,dtype='float',delimiter = ',',skiprows=1, skip_header=0, skip_footer=0, usecols=5,usemask=True)
print c.max()
print c.min()

我可以找到最大值和最小值,但是如何根据特定频道,速率和频率对其进行排序?任何帮助将是欣赏。 期望出来的Measured_Power:

Chanel, Rate, Max, Min, Average,
A, 3, .., .., ..,
A, 6, .., .., ..,
., ., .., .., ..,
., ., .., .., ..,
., ., .., .., ..,
A, 27,.., .., .., 

B, 3, .., .., ..,
B, 6, .., .., ..,
., ., .., .., ..,
., ., .., .., ..,
., ., .., .., ..,
B, 27,.., .., .., 

1 个答案:

答案 0 :(得分:1)

我希望我理解你想要的东西。您希望获得Measured_PowerRate的每种可能组合的最小值,最大值和平均值Frequency,对吧?

嗯,你可以用熊猫快速做到这一点:

import pandas as pd

data = pd.read_csv('data_file.csv')
grouped_measured_power = data.groupby([' Rate', ' Frequency'])[' Measured_Power']
min_measured_power_by_rate_and_freq = grouped_measured_power.min()
max_measured_power_by_rate_and_freq = grouped_measured_power.max()
average_measured_power_by_rate_and_freq = grouped_measured_power.mean()

那就是它!请注意,我在列名前面放了一个空格,因为CSV文件中有空格,但您可能更喜欢格式化数据文件。

这里的记录是你的例子的输出

> min_measured_power_by_rate_and_freq
 Rate   Frequency
3      100           20.05
       101           20.08
6      100           20.12
       101           20.26
27     100           20.16
       101           20.11
Name:  Measured_Power, dtype: float64

> max_measured_power_by_rate_and_freq
 Rate   Frequency
3      100           20.05
       101           20.08
6      100           20.21
       101           20.26
27     100           20.50
       101           20.11
Name:  Measured_Power, dtype: float64

> average_measured_power_by_rate_and_freq
 Rate   Frequency
3      100           20.050
       101           20.080
6      100           20.165
       101           20.260
27     100           20.330
       101           20.110
Name:  Measured_Power, dtype: float64

结果是一个多索引结构......你也可能想要unstack it

修改

记得你实际上可以通过同时应用多个聚合函数来做得更好,所以你可以这样做:

import pandas as pd
import numpy as np

data = pd.read_csv('data_file.csv')
grouped_measured_power = data.groupby([' Rate', ' Frequency'])[' Measured_Power']
result = grouped_measured_power.aggregate({'min': np.min,
                                           'max': np.max,
                                           'average': np.mean})

你会直接把所有东西放在一起:

> result
                  average    max    min
 Rate  Frequency                       
3     100          20.050  20.05  20.05
      101          20.080  20.08  20.08
6     100          20.165  20.21  20.12
      101          20.260  20.26  20.26
27    100          20.330  20.50  20.16
      101          20.110  20.11  20.11