检查列值是否在熊猫数据框

时间:2017-08-31 14:31:52

标签: python-2.7 pandas types

我有一个数千行的数据框,其中一部分包括如下数据。我还有其他专栏[" FP"," Y"," SLC"," C_ID"," NR" ]在这个数据框中。

z_to_s | z_to_t | s_to_t | t_p   | min  | max 
0.04   |        | 0.06   | 0.29  | 0.04 | 0.29
0.01   |        | NS     | NS    | 0.01 | 0.01
ND     |        | NS     | NS    | ND   | ND
0.04   |        | ND*    | NS    | ND*  | 0.04
       | 0.55*  |        |       | 0.55 | 0.55
19.88* |        | 0.46   | 0.09  | 0.09 |19.88

" min"和" max"列各自表示来自" z_to_s"," z_to_t"," s_to_t"和" t_p"的最小值和最大值。列。 ND或ND *始终被视为最小值,而NS被忽略。我需要保持输入数据的原始形式,所以我的最终输出应如下所示:

z_to_s | z_to_t | s_to_t | t_p   | min   | max 
0.04   |        | 0.06   | 0.29  | 0.04  | 0.29
0.01   |        | NS     | NS    | 0.01  | 0.01
ND     |        | NS     | NS    | ND    | ND
0.04   |        | ND*    | NS    | ND*   | 0.04
       | 0.55*  |        |       | 0.55* | 0.55
19.88* |        | 0.46   | 0.09  | 0.09  | 19.88*

为此,我一直在尝试使用以下代码来制定名为" QC_min"和" QC_max"

df["QC_min"] = df.drop(["FP","Y","SLC","C_ID","NR","min","max"], axis = 1).isin(data_concat["min"]).any(axis = 1)
df["QC_max"] = df.drop(["FP","Y","SLC","C_ID","NR","min","max"], axis = 1).isin(data_concat["max"]).any(axis = 1)

所以" QC_min"和" QC_max"具有TRUE / FALSE值取决于" min" /" max"匹配[" z_to_s"," z_to_t"," s_to_t"," t_p"]列值中的任何一个。我想写另一行代码,如果" QC_min"或者" QC_max"是的,我加了一个" *"到相应的" min"或者" max"值。但是,上面代码的输出显示如下。

z_to_s | z_to_t | s_to_t | t_p   | min   | max   | QC_min | QC_max
0.04   |        | 0.06   | 0.29  | 0.04  | 0.29  | FALSE  | FALSE
0.01   |        | NS     | NS    | 0.01  | 0.01  | FALSE  | FALSE
ND     |        | NS     | NS    | ND    | ND    | TRUE   | TRUE
0.04   |        | ND*    | NS    | ND*   | 0.04  | TRUE   | FALSE
       | 0.55*  |        |       | 0.55  | 0.55  | FALSE  | FALSE
19.88* |        | 0.46   | 0.09  | 0.09  | 19.88 | FALSE  | FALSE

其中所有数字对象都显示为false,无论它们是否匹配,而字符串对象为true。我检查了我的数据类型,想知道这是否是数据类型int / float / str问题。如果我添加一个astype(str)到我的" min"或者" max"所以我的代码变成了

df["QC_min"] = df.drop(["FP","Y","SLC","C_ID","NR","min","max"], axis = 1).isin(data_concat["min"]).astype(str).any(axis = 1)
df["QC_max"] = df.drop(["FP","Y","SLC","C_ID","NR","min","max"], axis = 1).isin(data_concat["max"]).astype(str).any(axis = 1)

一切都变为TRUE,无论*如此:

z_to_s | z_to_t | s_to_t | t_p   | min   | max   | QC_min | QC_max
0.04   |        | 0.06   | 0.29  | 0.04  | 0.29  | TRUE   | TRUE
0.01   |        | NS     | NS    | 0.01  | 0.01  | TRUE   | TRUE
ND     |        | NS     | NS    | ND    | ND    | TRUE   | TRUE
0.04   |        | ND*    | NS    | ND*   | 0.04  | TRUE   | TRUE
       | 0.55*  |        |       | 0.55  | 0.55  | TRUE   | TRUE
19.88* |        | 0.46   | 0.09  | 0.09  | 19.88 | TRUE   | TRUE

我哪里错了?关于如何解决这个/做我想做的建议将非常感激。感谢。

0 个答案:

没有答案