数据帧由集合x组成,集合x是通用集合,子集列包含一些子集。我想选择具有最高比率的子集,直到我覆盖整个集合x。
Uncovered = setx - subset
这是我的数据框在pandas中的样子:
ratio set x subset uncovered
2 2.00 [1, 3, 6, 8, 9, 0, 7] [8, 3, 6, 1] [0, 9, 7]
0 1.50 [1, 3, 6, 8, 9, 0, 7] [1, 3, 6] [0, 8, 9, 7]
1 1.00 [1, 3, 6, 8, 9, 0, 7] [9, 0] [8, 1, 3, 6, 7]
3 0.75 [1, 3, 6, 8, 9, 0, 7] [1, 3, 7] [0, 8, 6, 9]
我想创建另一个列,减去set x,累计未覆盖的列,直到我得到一个空列表。
我尝试了以下代码
p['tt']=list(p['set x']-p['subset'])
错误讯息:
----------------------------------------------- ---------------------------- TypeError Traceback(最近一次调用 持续) /Applications/anaconda/lib/python3.5/site-packages/pandas/core/ops.py 在na_op(x,y) 581 result = expressions.evaluate(op,str_rep,x,y, - > 582 raise_on_error = True,** eval_kwargs) 583除TypeError外:
/Applications/anaconda/lib/python3.5/site-packages/pandas/computation/expressions.py 在evaluate(op,op_str,a,b,raise_on_error,use_numexpr, ** eval_kwargs) 208 return _evaluate(op,op_str,a,b,raise_on_error = raise_on_error, - > 209 ** eval_kwargs) 210 return _evaluate_standard(op,op_str,a,b,raise_on_error = raise_on_error)
/Applications/anaconda/lib/python3.5/site-packages/pandas/computation/expressions.py 在_evaluate_numexpr中(op,op_str,a,b,raise_on_error,truediv, 逆转,** eval_kwargs) 119如果结果为无: - > 120 result = _evaluate_standard(op,op_str,a,b,raise_on_error) 121
/Applications/anaconda/lib/python3.5/site-packages/pandas/computation/expressions.py 在_evaluate_standard中(op,op_str,a,b,raise_on_error,** eval_kwargs) 61 _store_test_result(False) ---> 62返回op(a,b) 63
TypeError:不支持的操作数类型 - :' list'和'列出'
在处理上述异常期间,发生了另一个异常:
TypeError Traceback(最近一次调用 最后)in() ----> 1 p [' tt'] = list(p ['设置x'] - p ['子集'])
/Applications/anaconda/lib/python3.5/site-packages/pandas/core/ops.py 在包装器中(左,右,名称,na_op) 639 rvalues = algos.take_1d(rvalues,ridx) 640 - > 641 arr = na_op(左值,右值) 642 643 return left._constructor(wrap_results(arr),index = index,
/Applications/anaconda/lib/python3.5/site-packages/pandas/core/ops.py 在na_op(x,y) 586 result = np.empty(x.size,dtype = dtype) 587 mask = notnull(x)& NOTNULL(y)的 - > 588 result [mask] = op(x [mask],_values_from_object(y [mask])) 589 elif isinstance(x,np.ndarray): 590 result = np.empty(len(x),dtype = x.dtype)
TypeError:不支持的操作数类型 - :' list'和'列出'
答案 0 :(得分:0)
这应该适合你:
import pandas as pd
# ratio set x subset uncovered
# 2 2.00 [1, 3, 6, 8, 9, 0, 7] [8, 3, 6, 1] [0, 9, 7]
# 0 1.50 [1, 3, 6, 8, 9, 0, 7] [1, 3, 6] [0, 8, 9, 7]
# 1 1.00 [1, 3, 6, 8, 9, 0, 7] [9, 0] [8, 1, 3, 6, 7]
# 3 0.75 [1, 3, 6, 8, 9, 0, 7] [1, 3, 7] [0, 8, 6, 9]
p = pd.DataFrame(
[
{'set x': [1, 3, 6, 8, 9, 0, 7], 'subset': [1, 3, 6]},
{'set x': [1, 3, 6, 8, 9, 0, 7], 'subset': [9, 0]},
{'set x': [1, 3, 6, 8, 9, 0, 7], 'subset': [8, 3, 6, 1]},
{'set x': [1, 3, 6, 8, 9, 0, 7], 'subset': [1, 3, 7]},
])
def set_operation(x):
return list(set(x['set x']) - set(x['subset']))
p['tt'] = p.apply(set_operation, axis=1)
结果是:
set x subset tt
0 [1, 3, 6, 8, 9, 0, 7] [1, 3, 6] [0, 8, 9, 7]
1 [1, 3, 6, 8, 9, 0, 7] [9, 0] [8, 1, 3, 6, 7]
2 [1, 3, 6, 8, 9, 0, 7] [8, 3, 6, 1] [0, 9, 7]
3 [1, 3, 6, 8, 9, 0, 7] [1, 3, 7] [0, 8, 9, 6]