Question

我试图将数据帧中的每个值绑定在0.01和0.99

之间

我使用.apply(lambda x: (x - x.min()) / (x.max() - x.min()))成功地将数据标准化为0到1，如下所示：

df = pd.DataFrame({'one' : ['AAL', 'AAL', 'AAPL', 'AAPL'], 'two' : [1, 1, 5, 5], 'three' : [4,4,2,2]})

df[['two', 'three']].apply(lambda x: (x - x.min()) / (x.max() - x.min()))

df

现在我想绑定0.01到0.99之间的所有值

这就是我的尝试：

def bound_x(x):
    if x == 1:
        return x - 0.01
    elif x < 0.99:
        return x + 0.01

df[['two', 'three']].apply(bound_x)

DF

但是我收到以下错误：

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', u'occurred at index two')

Answer 1

有一个应用程序，错误clip method，为此：

import pandas as pd
df = pd.DataFrame({'one' : ['AAL', 'AAL', 'AAPL', 'AAPL'], 'two' : [1, 1, 5, 5], 'three' : [4,4,2,2]})    
df = df[['two', 'three']].apply(lambda x: (x - x.min()) / (x.max() - x.min()))
df = df.clip(lower=0.01, upper=0.99)

产量

    two  three
0  0.01   0.99
1  0.01   0.99
2  0.99   0.01
3  0.99   0.01

的问题

df[['two', 'three']].apply(bound_x)

是bound_x传递了类似df['two']的系列，然后if x == 1要求x == 1在布尔上下文中评估。 x == 1是类似

的布尔系列

In [44]: df['two'] == 1
Out[44]: 
0    False
1    False
2     True
3     True
Name: two, dtype: bool

Python尝试将此系列减少为单个布尔值True或False。熊猫遵循NumPy惯例raising an error when you try to convert a Series (or array) to a bool。

Answer 2

所以我有一个类似的问题，我希望自定义标准化，因为我常规的基准百分比或z分数是不够的。有时我知道人口的可行最大值和最小值是多少，因此除了我的样本，或者不同的中点，或者其他什么之外，我想要定义它！所以我构建了一个自定义函数（在这里使用代码中的额外步骤使其尽可能可读）：

def NormData(s,low='min',center='mid',hi='max',insideout=False,shrinkfactor=0.):    
    if low=='min':
        low=min(s)
    elif low=='abs':
        low=max(abs(min(s)),abs(max(s)))*-1.#sign(min(s))
    if hi=='max':
        hi=max(s)
    elif hi=='abs':
        hi=max(abs(min(s)),abs(max(s)))*1.#sign(max(s))

    if center=='mid':
        center=(max(s)+min(s))/2
    elif center=='avg':
        center=mean(s)
    elif center=='median':
        center=median(s)

    s2=[x-center for x in s]
    hi=hi-center
    low=low-center
    center=0.

    r=[]

    for x in s2:
        if x<low:
            r.append(0.)
        elif x>hi:
            r.append(1.)
        else:
            if x>=center:
                r.append((x-center)/(hi-center)*0.5+0.5)
            else:
                r.append((x-low)/(center-low)*0.5+0.)

    if insideout==True:
        ir=[(1.-abs(z-0.5)*2.) for z in r]
        r=ir

    rr =[x-(x-0.5)*shrinkfactor for x in r]    
    return rr

这将包含一个熊猫系列，甚至只是一个列表，并将其标准化为您指定的低点，中点和高点。还有一个收缩因素！允许你缩小0和1之间的数据（我必须在matplotlib中组合色彩映射时执行此操作：Single pcolormesh with more than one colormap using Matplotlib）所以你可能会看到代码如何工作，但基本上说你有值[-5 ，1,10]在一个样本中，但想要基于-7到7的范围进行标准化（所以任何高于7的值，我们的“10”被视为有效的7），中点为2，但缩小它以适应256 RGB色图：

#In[1]
NormData([-5,2,10],low=-7,center=1,hi=7,shrinkfactor=2./256)
#Out[1]
[0.1279296875, 0.5826822916666667, 0.99609375]

它也可以将你的数据翻出来...这看起来很奇怪，但我发现它对于热图有用。假设您希望颜色更接近0而不是高/低。您可以根据标准化数据进行热图，其中insideout = True：

#In[2]
NormData([-5,2,10],low=-7,center=1,hi=7,insideout=True,shrinkfactor=2./256)
#Out[2]
[0.251953125, 0.8307291666666666, 0.00390625]

所以现在最接近中心的“2”，定义为“1”是最高值。

无论如何，我认为我的问题与你的问题非常相似，这个功能对你有用。

Python Pandas Dataframe：将数据归一化到0.01到0.99之间？

2 个答案: