Question

我有一个dataframe，看起来像这样

    a   b   c
0  222  34  23
1  333  31  11
2  444  16  21
3  555  32  22
4  666  33  27
5  777  35  11

我需要检查列C是否大于最后3个rows的某个值（其他值的平均值）大于影响old value → new value (mean)的值

例如，在“ C”列第5行中，我将得到11→22

有我尝试过的方法，但是会产生错误

import pandas as pd

mean=22
# List of Tuples
matrix = [(222, 34, 23),
(333, 31, 11),
(444, 16, 21),
(555, 32, 22),
(666, 33, 27),
(777, 35, 11)
]
# Create a DataFrame object
df = pd.DataFrame(matrix, columns=list('abc'))
print(df)
df.iloc[-3:].loc[df["c"] < mean, "c"] = pd.Series(map(lambda x: str(x)+" → "+ str(mean), df.iloc[-3:].loc[df["c"] < mean, "c"]))

Answer 1

您可以使用Index.isin创建另一个掩码来测试最后3个索引值，因此不需要iloc并仅按掩码处理匹配的行：

mean=22
mask = (df["c"] < mean) & df.index.isin(df.index[-3:])
df.loc[mask, "c"] = df.loc[mask, "c"].astype(str) +" → "+ str(mean)
print (df)

     a   b        c
0  222  34       23
1  333  31       11
2  444  16       21
3  555  32       22
4  666  33       27
5  777  35  11 → 22

要用遮罩替换mean，解决方案更简单：

mean=22
mask = (df["c"] < mean) & df.index.isin(df.index[-3:])
df.loc[mask, "c"] = mean

Answer 2

您可以获取最后3行的副本，处理该副本，然后将新值报告到初始数据框中：

     a   b   c
0  222  34  23
1  333  31  11
2  444  16  21
3  555  32  22
4  666  33  27
5  777  35  22

给出：

Patient_Account

Answer 3

我没有尝试模仿确切的示例，但是您可以基于iloc和loc替换数据框中的任何值，而这些值在人们开始使用熊猫时会被忽略。

import pandas as pd
import io

# intialise data of lists.
txt_data = '''a,b,c
222,34,23
333,31,11
444,16,21
555,32,22
666,33,27
777,35,11'''

df = pd.read_csv(io.StringIO(txt_data))
# pick the value we want to replace
any_value = 21
# pick the mean value
mean_value = 12
# -3 means last 3 rows based on row index, 2 means last column based on column index
# apply a lambda function to change the value
df.iloc[-3:, 2] = df.iloc[-3:, 2].apply(lambda x: any_value if x > mean_value else x)

如何根据某些条件修改数据框并选择最后3行

3 个答案: