Question

我试图用1替换所有大于1的数字，同时以最小的努力在整个数据框中保持原始1和0不变。任何支持表示赞赏！

我的数据框看起来像这样，但是包含更多的列和行。

Report No   Apple   Orange   Lemon   Grape   Pear
One           5       0        2       1      1
Two           1       1        0       3      2
Three         0       0        2       1      3
Four          1       1        3       0      0
Five          4       0        0       1      1
Six           1       3        1       2      0

所需的输出：

Report No   Apple   Orange   Lemon   Grape   Pear
One           1       0        1       1      1
Two           1       1        0       1      1
Three         0       0        1       1      1
Four          1       1        1       0      0
Five          1       0        0       1      1
Six           1       1        1       1      0

Answer 1

使用pandas.DataFrame.clip：

new_df = df.clip(0, 1)

编辑：按名称排除第一列（这将就地编辑DataFrame）

mask = df.columns != "Report No"
df.loc[:, mask] = df.loc[:, mask].clip(0, 1)

Answer 2

您可以尝试一下。

df.set_index('Report No',inplace=True)
df[df>1]=1
df.reset_index()

Report No   Apple   Orange   Lemon   Grape   Pear
One           1       0        1       1      1
Two           1       1        0       1      1
Three         0       0        1       1      1
Four          1       1        1       0      0
Five          1       0        0       1      1
Six           1       1        1       1      0

或者如果您有一些非数字列，请使用此选项。无需使用set_index和reset_index。这等效于df.select_dtypes('number')

val = df._get_numeric_data()
val[val>1] = 1
df
Report No   Apple   Orange   Lemon   Grape   Pear
One           1       0        1       1      1
Two           1       1        0       1      1
Three         0       0        1       1      1
Four          1       1        1       0      0
Five          1       0        0       1      1
Six           1       1        1       1      0

或使用df.mask

df.set_index('Report No',inplace=True)
df.mask(df>1,1).reset_index()
Report No   Apple   Orange   Lemon   Grape   Pear
One           1       0        1       1      1
Two           1       1        0       1      1
Three         0       0        1       1      1
Four          1       1        1       0      0
Five          1       0        0       1      1
Six           1       1        1       1      0

或使用np.where

df[df.columns[1:]] = df.iloc[:,1:].where(df.iloc[:,1:] >1 ,1)

或使用np.select，在处理多个条件时可能会有所帮助。如果要将小于0的值转换为大于1的值。

df.set_index('Report No',inplace=True)
condlist = [df>=1,df<=0] #you can have more conditions and add choices accordingly.
choice = [1,0] #len(condlist) should be equal to len(choice).
df.loc[:] = np.select(condlist,choice)

像Jan提到的使用df.clip

不推荐，但是您可以尝试一下。使用df.astype。

df.set_index('Report No',inplace=True)
df.astype('bool').astype('int')

注意：：这只会将 fassy 值转换为False，而将 truthy 值转换为True，即会将0转换为False，除0以外的任何数字均为True，甚至是负数。

s = pd.Series([1,-1,0])
s.astype('bool')
0     True
1     True
2    False
dtype: bool

s.astype('bool').astype('int')
0    1
1    1
2    0
dtype: int32

Answer 3

最快，最简单的方法是遍历datframe的所有键，并使用numpy的where函数（必须导入的库）更改它们。然后，我们简单地将条件以及条件是否满足的值作为该函数的属性传递。在您的示例中，它看起来像这样：

for x in df.keys()[1:]:
   df[x] = np.where(df[x] > 1, 1, df[x])

请注意，在循环中，我已经放弃了第一个键，因为它的值不是整数

在大熊猫数据框中替换大于1的值

3 个答案: