使用列表在pandas中减去两列以创建一个cummalative列

时间:2016-10-17 03:42:29

标签: python pandas

数据帧由集合x组成,集合x是通用集合,子集列包含一些子集。我想选择具有最高比率的子集,直到我覆盖整个集合x。

Uncovered = setx - subset

这是我的数据框在pandas中的样子:

   ratio                  set x        subset        uncovered
2   2.00  [1, 3, 6, 8, 9, 0, 7]  [8, 3, 6, 1]        [0, 9, 7]
0   1.50  [1, 3, 6, 8, 9, 0, 7]     [1, 3, 6]     [0, 8, 9, 7]
1   1.00  [1, 3, 6, 8, 9, 0, 7]        [9, 0]  [8, 1, 3, 6, 7]
3   0.75  [1, 3, 6, 8, 9, 0, 7]     [1, 3, 7]     [0, 8, 6, 9]

我想创建另一个列,减去set x,累计未覆盖的列,直到我得到一个空列表。

我尝试了以下代码

p['tt']=list(p['set x']-p['subset'])

错误讯息:

  

----------------------------------------------- ---------------------------- TypeError Traceback(最近一次调用   持续)   /Applications/anaconda/lib/python3.5/site-packages/pandas/core/ops.py   在na_op(x,y)       581 result = expressions.evaluate(op,str_rep,x,y,    - > 582 raise_on_error = True,** eval_kwargs)       583除TypeError外:

     

/Applications/anaconda/lib/python3.5/site-packages/pandas/computation/expressions.py   在evaluate(op,op_str,a,b,raise_on_error,use_numexpr,   ** eval_kwargs)       208 return _evaluate(op,op_str,a,b,raise_on_error = raise_on_error,    - > 209 ** eval_kwargs)       210 return _evaluate_standard(op,op_str,a,b,raise_on_error = raise_on_error)

     

/Applications/anaconda/lib/python3.5/site-packages/pandas/computation/expressions.py   在_evaluate_numexpr中(op,op_str,a,b,raise_on_error,truediv,   逆转,** eval_kwargs)       119如果结果为无:    - > 120 result = _evaluate_standard(op,op_str,a,b,raise_on_error)       121

     

/Applications/anaconda/lib/python3.5/site-packages/pandas/computation/expressions.py   在_evaluate_standard中(op,op_str,a,b,raise_on_error,** eval_kwargs)        61 _store_test_result(False)   ---> 62返回op(a,b)        63

     

TypeError:不支持的操作数类型 - :' list'和'列出'

     

在处理上述异常期间,发生了另一个异常:

     

TypeError Traceback(最近一次调用   最后)in()   ----> 1 p [' tt'] = list(p ['设置x'] - p ['子集'])

     

/Applications/anaconda/lib/python3.5/site-packages/pandas/core/ops.py   在包装器中(左,右,名称,na_op)       639 rvalues = algos.take_1d(rvalues,ridx)       640    - > 641 arr = na_op(左值,右值)       642       643 return left._constructor(wrap_results(arr),index = index,

     

/Applications/anaconda/lib/python3.5/site-packages/pandas/core/ops.py   在na_op(x,y)       586 result = np.empty(x.size,dtype = dtype)       587 mask = notnull(x)& NOTNULL(y)的    - > 588 result [mask] = op(x [mask],_values_from_object(y [mask]))       589 elif isinstance(x,np.ndarray):       590 result = np.empty(len(x),dtype = x.dtype)

     

TypeError:不支持的操作数类型 - :' list'和'列出'

1 个答案:

答案 0 :(得分:0)

这应该适合你:

import pandas as pd

#    ratio                  set x        subset        uncovered
# 2   2.00  [1, 3, 6, 8, 9, 0, 7]  [8, 3, 6, 1]        [0, 9, 7]
# 0   1.50  [1, 3, 6, 8, 9, 0, 7]     [1, 3, 6]     [0, 8, 9, 7]
# 1   1.00  [1, 3, 6, 8, 9, 0, 7]        [9, 0]  [8, 1, 3, 6, 7]
# 3   0.75  [1, 3, 6, 8, 9, 0, 7]     [1, 3, 7]     [0, 8, 6, 9]

p = pd.DataFrame(
    [
        {'set x': [1, 3, 6, 8, 9, 0, 7], 'subset': [1, 3, 6]},
        {'set x': [1, 3, 6, 8, 9, 0, 7], 'subset': [9, 0]},
        {'set x': [1, 3, 6, 8, 9, 0, 7], 'subset': [8, 3, 6, 1]},
        {'set x': [1, 3, 6, 8, 9, 0, 7], 'subset': [1, 3, 7]},
    ])


def set_operation(x):
    return list(set(x['set x']) - set(x['subset']))

p['tt'] = p.apply(set_operation, axis=1)

结果是:

                   set x        subset               tt
0  [1, 3, 6, 8, 9, 0, 7]     [1, 3, 6]     [0, 8, 9, 7]
1  [1, 3, 6, 8, 9, 0, 7]        [9, 0]  [8, 1, 3, 6, 7]
2  [1, 3, 6, 8, 9, 0, 7]  [8, 3, 6, 1]        [0, 9, 7]
3  [1, 3, 6, 8, 9, 0, 7]     [1, 3, 7]     [0, 8, 9, 6]