如果1列中的值介于2个其他列中的值之间,则使用权重创建Pandas DataFrame列

时间:2017-03-08 16:38:26

标签: python pandas lambda

如果一列中的值介于其他列中的两个值之间,则无法将权重(int)添加到新的Pandas DataFrame列。我能够创建具有True / False值的列(如果我使用astype,则为0/1值)。

import pandas as pd

df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6], 'c': [3,6,4]})
df

   a  b  c
0  1  4  3
1  2  5  6
2  3  6  4
  

这有效:

df['between_bool'] = df['c'].between(df['a'], df['b'])
df

   a  b  c between_bool
0  1  4  3         True     # 3 is between 1 and 4
1  2  5  6        False     # 6 is NOT between 2 and 5
2  3  6  4         True     # 4 is between 3 and 6
  

然而,这不起作用:

df['between_int'] = df['c'].apply(lambda x: 2 if df['c'].between(df['a'], df['b']) else 0)
  

上面的代码会产生以下错误:

Traceback (most recent call last):
  File "C:\Python36\envs\PortfolioManager\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-14-0aa1e7cfd5c2>", line 1, in <module>
    df['between_int'] = df['c'].apply(lambda x: 2 if df['c'].between(df['a'], df['b']) else 0)
  File "C:\Python36\envs\PortfolioManager\lib\site-packages\pandas\core\series.py", line 2294, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas\src\inference.pyx", line 1207, in pandas.lib.map_infer (pandas\lib.c:66124)
  File "<ipython-input-14-0aa1e7cfd5c2>", line 1, in <lambda>
  

所需的输出是:

   a  b  c between_int
0  1  4  3           2      # 3 is between 1 and 4
1  2  5  6           0      # 6 is NOT between 2 and 5
2  3  6  4           2      # 4 is between 3 and 6

有什么想法吗?

2 个答案:

答案 0 :(得分:1)

我希望我能正确理解你,但如果你只是想在这个条件下添加固定重量2,可以选择以下方法:

import numpy as np
df['between_int'] = np.where(df['c'].between(df['a'], df['b']), 2, 0)

或者,如果您不想导入numpy,则可以执行以下操作:

df['between_int'] = 0
df.loc[df['c'].between(df['a'], df['b']), 'between_int'] = 2

希望这有帮助!

答案 1 :(得分:1)

我认为您最初想要使用def func(data, (x_0,y_0)): y, x = numpy.indices(data.shape) r = (x - x_0)**2 + (y - y_0)**2 float_values, r = numpy.unique(r, return_inverse=True) return float_values ** 0.5, r.reshape(data.shape) 做的是:

apply

看到与你的不同之处:

    数据集df['between_int'] = df.apply(lambda x: 2 if x['c'] in range(x['a'], x['b']) else 0, axis=1) 上的
  1. apply而不是系列df
  2. 获取您要df['c']而不是x['c']检查的值,因为您的lambda是x的函数
  3. 因为我将df['c']更改为df['c']我不能再使用x['c']between
  4. 对于两个边界,请按in rangex['a']进行调用,原因与第2点相同
  5. 最后,不要忘记x['b']现在axis=1在数据框
  6. 无论如何,Swebbo的解决方案完美无缺!