合并值的平均值

时间:2019-07-13 22:31:54

标签: python pandas binning

我有2个单独的数据框,并希望在它们之间进行关联

Time  temperature   |   Time  ratio
0        32         |    0        0.02
1        35         |    1        0.1
2        30         |    2        0.25
3        31         |    3        0.17
4        34         |    4        0.22
5        34         |    5        0.07

我想每隔0.05(以比率为单位)对数据进行分类,并以时间为指标,并在每个分类中对与该分类对应的所有温度值进行平均。

因此,我将每0.05点获得一个平均值 任何人都可以帮忙吗?谢谢!

****编辑数据的外观****(左侧的df1,右侧的df2)

Time     device-1    device-2...   |   Time    device-1    device-2...
0        32            34          |    0        0.02       0.01
1        35            31          |    1        0.1        0.23
2        30            30          |    2        0.25       0.15
3        31            32          |    3        0.17       0.21
4        34            35          |    4        0.22       0.13
5        34            31          |    5        0.07       0.06

1 个答案:

答案 0 :(得分:1)

这可以与pandas库一起使用:

import pandas as pd
import numpy as np

temp = [32,35,30,31,34,34]
ratio = [0.02,0.1,0.25,0.17,0.22,0.07]
times = range(6)

# Create your dataframe
df = pd.DataFrame({'Time': times, 'Temperature': temp, 'Ratio': ratio})

# Bins
bins = pd.cut(df.Ratio,np.arange(0,0.25,0.05))

# get the mean temperature of each group and the list of each time
df.groupby(bins).agg({"Temperature": "mean", "Time": list})

输出:

             Temperature    Time
Ratio
(0.0, 0.05]         32.0     [0]
(0.05, 0.1]         34.5  [1, 5]
(0.1, 0.15]          NaN      []
(0.15, 0.2]         31.0     [3]

您可以像这样用.dropna()丢弃空箱:

df.groupby(bins).agg({"Temperature": "mean", "Time": list}).dropna()

             Temperature    Time
Ratio
(0.0, 0.05]         32.0     [0]
(0.05, 0.1]         34.5  [1, 5]
(0.15, 0.2]         31.0     [3]

编辑:对于多台计算机,这是一种解决方案:

import pandas as pd
import numpy as np

n_machines = 3
# Generate random data for temperature and ratios
temperature_df = pd.DataFrame( {'Machine_{}'.format(i): 
                                 pd.Series(np.random.randint(30,40,10)) 
                               for i in range(n_machines)} )

ratio_df = pd.DataFrame( {'Machine_{}'.format(i): 
                           pd.Series(np.random.uniform(0.01,0.5,10)) 
                          for i in range(n_machines)} )

# If ratio is between 0 and 1, we get the bins spaced by .05
def get_bins(s):
    return pd.cut(s,np.arange(0,1,0.05))

# Get bin assignments for each machine
bins = ratio_df.apply(get_bins,axis=1)

# Get the mean of each group for each machine
df = temperature_df.apply(lambda x: x.groupby(bins[x.name]).agg("mean"))

然后,如果要显示结果,可以使用seaborn包:

import matplotlib.pyplot as plt
import seaborn as sns

df_reshaped = df.reset_index().melt(id_vars='index')
df_reshaped.columns = [ 'Ratio bin','Machine','Mean temperature' ]

sns.barplot(data=df_reshaped,x="Ratio bin",y="Mean temperature",hue="Machine")
plt.show()

enter image description here