我正在尝试使用一个以两列作为参数的函数在pandas数据框架中创建一个新列
def ipf_cat(var, con):
if var == "Idiopathic pulmonary fibrosis":
if con in range(95,100):
result = 4
if con in range(70,95):
result = 3
if con in range(50,70):
result = 2
if con in range(0,50):
result = 1
return result
然后
df['ipf_category'] = ipf_cat(df['dx1'], df['dxcon1'])
其中df ['dx1']是一列和一个字符串,而df ['dxcon1']是另一列和0-100的整数。该函数在python中工作正常,但我不断收到此错误
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
我见过以前的答案,例如
Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
但是我无法将这些解决方案应用到我的特定功能中。
答案 0 :(得分:1)
我使用pd.cut()方法:
来源DF
In [157]: df
Out[157]:
con var
0 53 ???
1 97 Idiopathic pulmonary fibrosis
2 75 ???
3 11 Idiopathic pulmonary fibrosis
4 70 ???
5 52 Idiopathic pulmonary fibrosis
6 74 ???
7 25 Idiopathic pulmonary fibrosis
8 92 ???
9 80 Idiopathic pulmonary fibrosis
解决方案:
In [158]: df['ipf_category'] = -999
...:
...: bins = [-1, 50, 70, 95, 101]
...: labels = [1,2,3,4]
...:
...: df.loc[df['var']=='Idiopathic pulmonary fibrosis', 'ipf_category'] = \
...: pd.cut(df['con'], bins=bins, labels=labels)
...:
In [159]: df
Out[159]:
con var ipf_category
0 53 ??? -999
1 97 Idiopathic pulmonary fibrosis 4
2 75 ??? -999
3 11 Idiopathic pulmonary fibrosis 1
4 70 ??? -999
5 52 Idiopathic pulmonary fibrosis 2
6 74 ??? -999
7 25 Idiopathic pulmonary fibrosis 1
8 92 ??? -999
9 80 Idiopathic pulmonary fibrosis 3
设定:
df = pd.DataFrame({
'con':np.random.randint(100, size=10),
'var':np.random.choice(['Idiopathic pulmonary fibrosis','???'], 10)
})