Question

我有一个大型数据框，看起来像：

startService(serviceIntent);
stopService(serviceIntent);

我想用11替换大于9的每个元素。

因此，上述示例的所需输出为：

df1['A'].ix[1:3]
2017-01-01 02:00:00    [33, 34, 39]
2017-01-01 03:00:00    [3, 43, 9]

编辑：

我的实际数据框大约有20,000行，每行都有大小为2000的列表。

有没有办法为每一行使用df1['A'].ix[1:3] 2017-01-01 02:00:00 [11, 11, 11] 2017-01-01 03:00:00 [3, 11, 9]函数？我认为它会比numpy.minimum方法更快？

Answer 1

我知道这是一篇旧帖子，但 Pandas 现在直接支持 DataFrame.where。在您的示例中：

df.where(df <= 9, 11, inplace=True)

请注意，pandas 的 where 与 numpy.where 不同。在 Pandas 中，当 condition == True 时，使用数据帧中的当前值。当 condition == False 时，采用另一个值。

编辑：

您只需使用 Series.where 即可为一列实现相同的效果：

df['A'].where(df['A'] <= 9, 11, inplace=True)

Answer 2

您可以将apply与list comprehension：

一起使用

df1['A'] = df1['A'].apply(lambda x: [y if y <= 9 else 11 for y in x])
print (df1)
                                A
2017-01-01 02:00:00  [11, 11, 11]
2017-01-01 03:00:00    [3, 11, 9]

更快的解决方案首先转换为numpy array，然后使用numpy.where：

a = np.array(df1['A'].values.tolist())
print (a)
[[33 34 39]
 [ 3 43  9]]

df1['A'] = np.where(a > 9, 11, a).tolist()
print (df1)
                                A
2017-01-01 02:00:00  [11, 11, 11]
2017-01-01 03:00:00    [3, 11, 9]

Answer 3

Very simply : df[df > 9] = 11

Answer 4

您可以使用通过.values函数访问的numpy索引。

df['col'].values[df['col'].values > x] = y

其中要用y值替换大于x的任何值。

因此，问题示例如下：

df1['A'].values[df1['A'] > 9] = 11

Answer 5

我提出了一个解决方案，将大于h的每个元素替换为1 else 0，它具有简单的解决方案：

df = (df > h) * 1

（这不能解决OP的问题，因为所有df <= h都被0代替。）

替换pandas数据帧中大于数字的值

5 个答案: