Question

我在向量上使用 limit 时发现了 pandas.DataFrame.replace 参数的意外行为。

执行以下操作时：

import pandas as pd
array = pd.DataFrame([1,2,3,3,2,3,1,2,3,3,3,3,3,3,3,3,3,1,1,1,2,3])

array_replace = array.replace(3, 5)
array_limit = array.replace(3, 5, limit=3)

我希望 array_replace 和 array_limit 不同。不过

print(array_replace.to_numpy().reshape(-1))
print(array_limit.to_numpy().reshape(-1))
print(f'Arrays are equal: {array_replace.equals(array_limit)}')

给予（无论使用什么 method）

[1 2 5 5 2 5 1 2 5 5 5 5 5 5 5 5 5 1 1 1 2 5]
[1 2 5 5 2 5 1 2 5 5 5 5 5 5 5 5 5 1 1 1 2 5]
Arrays are equal: True

当使用 NaN 和 pandas.DataFrame.fillna 代替时，不也是如此：

import numpy as np
import pandas as pd
array = pd.DataFrame([1,2,3,3,2,3,1,2,3,3,3,3,3,3,3,3,3,1,1,1,2,3])
array[array==3] = np.nan

array_replace = array.fillna(5)
array_limit = array.fillna(5, limit=3)

print(array_replace.to_numpy().reshape(-1))
print(array_limit.to_numpy().reshape(-1))
print(f'Arrays are equal: {array_replace.equals(array_limit)}')

# Out:
# [1. 2. 5. 5. 2. 5. 1. 2. 5. 5. 5. 5. 5. 5. 5. 5. 5. 1. 1. 1. 2. 5.]
# [ 1.  2.  5.  5.  2.  5.  1.  2. nan nan nan nan nan nan nan nan nan  1.
#   1.  1.  2. nan]
# Arrays are equal: False

为什么 limit 在这两种情况下的工作方式不同？

在 Python 3.8.3、Pandas 1.2.4 上测试

Answer 1

limit 参数仅与 method 参数相关。但方法参数仅适用于 value = None。

Documentation：

<块引用>

method {'pad', 'ffill', 'bfill', None}：替换时使用的方法，当 to_replace 为标量、列表或元组且值为 None 时。< /p>

和

<块引用>

limit int，默认 None：向前或向后填充的最大尺寸间隙。

limit 参数用法的一个工作示例：

import pandas as pd
array = pd.DataFrame([1,2,3,3,2,3,1,2,3,3,3,3,3,3,3,3,3,1,1,1,2,3])
array_replace = array.replace(3, None, method='ffill')
array_limit = array.replace(3, None, limit=3, method='ffill')

输出：

array_replace = [1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 2]
array_limit =   [1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 1, 1, 1, 2, 2]

限制熊猫“替换”似乎不起作用

1 个答案: