在Pandas中,如何将DataFrame通过两列进行合并,而其他列更改为这些合并中的均值?

时间:2019-11-21 15:39:37

标签: pandas dataframe cut resampling

我已经使用UMAP将标准虹膜数据集投影到了二维,并将2D图的x和y位置的UMAP维度作为列添加到了数据框:

import numpy as np
import math
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from sklearn.datasets import load_iris
import umap # pip install umap-learn

iris = load_iris()
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)
iris_df['species'] = pd.Series(iris.target).map(dict(zip(range(3), iris.target_names)))

_umap = umap.UMAP().fit_transform(iris.data)
iris_df['UMAP_x'] = _umap[:,0]
iris_df['UMAP_y'] = _umap[:,1]
iris_df.head()

我想将UMAP_xUMAP_y列都合并为25个bin,然后将数据帧中的其他列更改为每个bin中列的平均值。怎么做?感觉像cut或重新采样可能会找到答案,但是我不确定如何实现。

1 个答案:

答案 0 :(得分:1)

您可以使用cut定义箱,然后将groupbytransform一起使用,以计算每个箱的平均值。

import numpy as np
import math
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from sklearn.datasets import load_iris
import umap

iris = load_iris()
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)
iris_df['species'] = pd.Series(iris.target).map(dict(zip(range(3), iris.target_names)))

_umap = umap.UMAP().fit_transform(iris.data)
iris_df['UMAP_x'] = _umap[:,0]
iris_df['UMAP_y'] = _umap[:,1]

# Define bins for UMAP_x and UMAP_y params
iris_df['UMAP_x_bin'] = pd.cut(iris_df['UMAP_x'], bins=25)
iris_df['UMAP_y_bin'] = pd.cut(iris_df['UMAP_y'], bins=25)

# Calculate mean value for each bin
iris_df['UMAP_x_mean'] = iris_df.groupby('UMAP_x_bin')['UMAP_x'].transform('mean')
iris_df['UMAP_y_mean'] = iris_df.groupby('UMAP_y_bin')['UMAP_y'].transform('mean')

iris_df.head()