Question

import pandas as pd
from pandas import DataFrame,Series
import numpy as np
titanic=pd.read_csv('C:/Users/prasun.j/Downloads/train.csv')
sex=[]
if titanic['Sex']=='male':
    sex.append(1)
else:
    sex.append(0)
sex

我正在尝试一个列表，当if语句遇到男性时应该追加1，或者遇到女性时应该追加0，我不知道我做错了什么，可以有人帮忙，提前谢谢，执行抛出错误

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-265768ba34be> in <module>()
      4 titanic=pd.read_csv('C:/Users/prasun.j/Downloads/train.csv')
      5 sex=[]
----> 6 if titanic['Sex']=='male':
      7     sex.append(1)
      8 else:

C:\anaconda\lib\site-packages\pandas\core\generic.pyc in __nonzero__(self)
   1119         raise ValueError("The truth value of a {0} is ambiguous. "
   1120                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1121                          .format(self.__class__.__name__))
   1122 
   1123     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Answer 1

当您检查 if titanic['Sex']=='male' 时，您正在将male与整个系列进行比较，这就是您获得 ValueError 的原因。

如果您真的想继续使用迭代方法，可以使用 iterrows ，并检查每行的条件。但是，您应该避免使用Pandas进行迭代，这里有一个更清晰的解决方案。

<强> 设置

df = pd.DataFrame({'sex': ['male', 'female', 'male', 'male', 'female']})

在这里使用 np.where ：

np.where(df.sex == 'male', 1, 0)
# array([1, 0, 1, 1, 0])

你也可以使用布尔索引：

(df.sex == 'male').astype(int).values.tolist()
# [1, 0, 1, 1, 0]

Answer 2

你也可以使用get_dummies删除第一列（在这种情况下删除female）：

df = pd.DataFrame({'sex': ['male', 'female', 'male', 'male', 'female','male'], 'age':[10,20,30,40,50,60]})

使用pd.get_dummies获取您的值：

sex = pd.get_dummies(df['sex'],drop_first=True)
sex
   male
0  1
1  0
2  1
3  1
4  0
5  1

然后转换为列表：

list_sex = sex['male'].tolist()
list_sex

[1, 0, 1, 1, 0, 1]

根据pandas列中的数据，当为1时为true且为零时附加空列表？

2 个答案: