抱歉标题不好。请编辑以使其有意义。
下面有很多代码。别担心。这只是一个很小的例子。
我想要做的是按标签分组数据,应用我的函数(检查给定标签的坐标是在椭圆内部还是外部)。这将返回与数据长度相同的true / false数组。如果标签位于椭圆之外,我想将标签更改为-1
。
尽可能地使用apply
和transform
,
label
1 [True, True, False, True, False, False, True, ...
2 [False, False, True, True, False, False, True,...
dtype: object
但是如何将其转换回原始数据帧,并为遇到的每个False
将标签设置为-1?
底部的注释位显示了它如何适用于没有标签。
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import patches
import pandas as pd
def _plot_ellipse(xdata, ydata, n_std, ax = None, return_ax = False, **kwargs):
"""
Parameters
----------
xdata : array-like
ydata : array-like
n_std : scalar
Number of sigmas (e.g. 2 for 95% confidence interval)
ax : ax to plot on
return_ax : bool
Returns axis for plot
return_inside : bool
Returns a list of True/False for inside/outside ellipse
**kwargs
Passed to matplotlib.patches.Ellipse. Color, alpha, etc..
Returns
-------
Ellipse with the correct orientation, given the data
Example
-------
x = np.random.randn(100)
y = 0.1 * x + np.random.randn(100)
fig, ax = plt.subplots()
ax, in_out = _plot_ellipse(x, y, n_std = 2, ax = ax, alpha = 0.5, return_ax = True)
ax.scatter(x, y, c = in_out)
plt.show()
"""
def _eigsorted(cov):
vals, vecs = np.linalg.eigh(cov)
order = vals.argsort()[::-1]
return vals[order], vecs[:, order]
points = np.stack([xdata, ydata], axis = 1) # Combine points to 2-column matrix
center = points.mean(axis = 0) # Calculate mean for every column (x,y)
# Calculate covariance matrix for coordinates (how correlated they are)
cov = np.cov(points, rowvar = False) # rowvar = False because there are 2 variables, not nrows variables
vals, vecs = _eigsorted(cov)
angle = np.degrees(np.arctan2(*vecs[:,0][::-1]))
width, height = 2 * n_std * np.sqrt(vals)
in_out = _is_in_ellipse(xdata = xdata, ydata = ydata, center = center, width = width, height = height, angle = angle)
if return_ax:
ellip = patches.Ellipse(xy = center, width = width, height = height, angle = angle, **kwargs)
if ax is None:
ax = plt.gca()
ax.add_artist(ellip)
return ax, in_out
else:
return in_out
def _is_in_ellipse(xdata, ydata, center, width, height, angle):
"""
Determines whether points are in ellipse, given the parameters of the ellipse
Parameters
----------
xdata : array-like
ydata : array-lie
center : array-like, tuple
center of the ellipse as (x,y)
width : scalar
height : scalar
angle : scalar
angle in degrees
Returns
-------
List of True/False, depending on points being inside/outside of the ellipse
"""
cos_angle = np.cos(np.radians(180-angle))
sin_angle = np.sin(np.radians(180-angle))
xc = xdata - center[0]
yc = ydata - center[1]
xct = xc * cos_angle - yc * sin_angle
yct = xc * sin_angle + yc * cos_angle
rad_cc = (xct**2/(width/2)**2) + (yct**2/(height/2)**2)
in_ellipse = []
for r in rad_cc:
in_ellipse.append(True) if r <= 1. else in_ellipse.append(False)
return in_ellipse
# For a single label
# x = np.random.normal(0, 1, 100)
# y = np.random.normal(0, 1, 100)
# labels = [1] * len(x)
#
# df = pd.DataFrame({"x" : x, "y" : y, "label" : labels})
#
# ax, in_out = _plot_ellipse(df.x, df.y, 2, return_ax = True, alpha = 0.5)
# ax.scatter(df.x, df.y, c = in_out)
# plt.show()
# For multiple labels
x = np.random.normal(0, 1, 100)
y = np.random.normal(0, 1, 100)
labels1 = [1] * 50
labels2 = [2] * 50
labels = labels1 + labels2
df = pd.DataFrame({"x" : x, "y" : y, "label" : labels})
df = df.groupby("label").apply(lambda group: _plot_ellipse(xdata = group["x"], ydata = group["y"], n_std = 1, return_ax = False))
print(df)
答案 0 :(得分:1)
所以,这是一种可行的方式,如果我这样做,我可能会重新考虑一下,但是你会得到这个想法,你可以从那里开始。为简单起见,我已将您的return_ax逻辑注释掉了。
您不需要#id
上的lambda,因为您已将该功能定义为groupby.apply
。你可以传递_plot_ellipse
一个可调用的python以及kwargs(这些将被传递给你的callable)。
该行看起来像
apply
在你的函数中,pandas传递的第一个参数将是该组。因此,您不需要在函数参数中引用df = df.groupby("label").apply(_plot_ellipse, n_std = 1, return_ax = False)
和x
变量。另外,要从y
函数返回DataFrame
,您需要返回apply
,在这种情况下,您将修改您的论坛,然后返回该论坛。传递的组从pandas(组名)中获取一个名为DataFrame
的属性,在您的情况下,它将只是标签。我将函数的第一行更改为this,因此可以保存相同的代码
name
然后我修改了xdata = grp.x
ydata = grp.y
label = grp.name
传递标签的代码,然后保留标签或将其更改为-1。在我重新分配_is_in_ellipse
成为结果
您的完整修改示例如下。
grp.label