我有2个单独的数据框,并希望在它们之间进行关联
Time temperature | Time ratio
0 32 | 0 0.02
1 35 | 1 0.1
2 30 | 2 0.25
3 31 | 3 0.17
4 34 | 4 0.22
5 34 | 5 0.07
我想每隔0.05(以比率为单位)对数据进行分类,并以时间为指标,并在每个分类中对与该分类对应的所有温度值进行平均。
因此,我将每0.05点获得一个平均值 任何人都可以帮忙吗?谢谢!
****编辑数据的外观****(左侧的df1,右侧的df2)
Time device-1 device-2... | Time device-1 device-2...
0 32 34 | 0 0.02 0.01
1 35 31 | 1 0.1 0.23
2 30 30 | 2 0.25 0.15
3 31 32 | 3 0.17 0.21
4 34 35 | 4 0.22 0.13
5 34 31 | 5 0.07 0.06
答案 0 :(得分:1)
这可以与pandas
库一起使用:
import pandas as pd
import numpy as np
temp = [32,35,30,31,34,34]
ratio = [0.02,0.1,0.25,0.17,0.22,0.07]
times = range(6)
# Create your dataframe
df = pd.DataFrame({'Time': times, 'Temperature': temp, 'Ratio': ratio})
# Bins
bins = pd.cut(df.Ratio,np.arange(0,0.25,0.05))
# get the mean temperature of each group and the list of each time
df.groupby(bins).agg({"Temperature": "mean", "Time": list})
输出:
Temperature Time
Ratio
(0.0, 0.05] 32.0 [0]
(0.05, 0.1] 34.5 [1, 5]
(0.1, 0.15] NaN []
(0.15, 0.2] 31.0 [3]
您可以像这样用.dropna()
丢弃空箱:
df.groupby(bins).agg({"Temperature": "mean", "Time": list}).dropna()
Temperature Time
Ratio
(0.0, 0.05] 32.0 [0]
(0.05, 0.1] 34.5 [1, 5]
(0.15, 0.2] 31.0 [3]
编辑:对于多台计算机,这是一种解决方案:
import pandas as pd
import numpy as np
n_machines = 3
# Generate random data for temperature and ratios
temperature_df = pd.DataFrame( {'Machine_{}'.format(i):
pd.Series(np.random.randint(30,40,10))
for i in range(n_machines)} )
ratio_df = pd.DataFrame( {'Machine_{}'.format(i):
pd.Series(np.random.uniform(0.01,0.5,10))
for i in range(n_machines)} )
# If ratio is between 0 and 1, we get the bins spaced by .05
def get_bins(s):
return pd.cut(s,np.arange(0,1,0.05))
# Get bin assignments for each machine
bins = ratio_df.apply(get_bins,axis=1)
# Get the mean of each group for each machine
df = temperature_df.apply(lambda x: x.groupby(bins[x.name]).agg("mean"))
然后,如果要显示结果,可以使用seaborn
包:
import matplotlib.pyplot as plt
import seaborn as sns
df_reshaped = df.reset_index().melt(id_vars='index')
df_reshaped.columns = [ 'Ratio bin','Machine','Mean temperature' ]
sns.barplot(data=df_reshaped,x="Ratio bin",y="Mean temperature",hue="Machine")
plt.show()