Question

我有一个数据集，其中包含一列＆＃39; y＆＃39;其中存在特定值。我想取一个列并创建一个新列（z）表示如果y值是47472则z应该是1000，如果y <1000则z = y * 2，否则所有其他值应该是2000。这是数据的模拟示例。我没有＆＃39; z＆＃39;列，但我想创建它：

          y      z
0      1751   2000
1       800   1600
2     10000   2000
3       350    700
4       750   1500
5      1750   3500
6     30000   2000
7     47472   1000


def test(y):
    if y == 47472:
        z=1000
    elif y < 1000:
        z=y*2
    else:
        z=2000
    return Z

# I tried to call the above function below
z = test(y)
z

但我没有得到结果，而是显示以下错误：

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Answer 1

问题是您在if语句中使用了Series，例如：

JSONArray users = response.getJSONArray("statistics");
for (int i = 0; i < users.length(); i++) {
    JSONObject student = users.getJSONObject(i);

    String userName = student.getString("username");
    String score = student.getString("score");

    statistics.append(userName + " " + score + " \n");
}

假设if y == 47472:是您的DataFrame的一部分，这将产生一个布尔列表：

这是不合法的，因此它建议您使用返回一个布尔值的布尔函数，例如>>> df['y']==47472 0 False 1 False 2 False 3 False 4 False 5 False 6 False 7 True Name: y, dtype: bool，any()等。相反，您应该使用boolean indexing：

all()

修改：据EdChum评论，我正在执行chained indexing：

# df is the dataframe with your data
# adding column z
df['z'] = pd.Series(np.zeros(df.shape[0]))
# if y == 47472 then put 1000
df.loc[df['y']==47472, 'z'] = 1000
# filter <1000
df.loc[df['y']<1000, 'z'] = 2*df['y']
# now set rest to 2000 (i.e. ones that do not comply previous 2 conditions)
df.loc[(df['y']>=1000) & (df['y']!=47472),'z'] = 2000

应使用df['z'][df['y']<1000] = 2*df['y']

来避免

loc

使用if / else语句创建一个新的变量数字列

1 个答案: