Question

我有一个数据框，我试图过滤基于列（dtype = object）的熊猫str.contains或startswith。但是，当我运行代码时，iam获取的第一个参数必须是字符串或编译后的模式错误。如何解决。

df_ipp_h_simple_hsr = df_ipp_h_simple [df_ipp_h_simple ['ORDER_TYPE']。str.startswith（（'HSR'，'HOSP'））＆ df_ipp_h_simple ['PRODUCT']。str.contains（“ M”）＆〜df_ipp_h_simple ['PRODUCT']。str.contains（（“ 1611”，“ 1612”，“ 1635”））＆〜df_ipp_h_simple ['PRODUCT']。str.startswith（（“ 5”，“ 6”，“ 97”））＆〜df_ipp_h_simple ['CUSTOMER']。str.contains（（“ POPEYES”，“ CHECKERS”，“ KRYSTAL”）） ]

预期的输出是已过滤的数据帧，但iam低于错误：

〜\ AppData \ Local \ Continuum \ anaconda3 \ envs \ enzy \ lib \ re.py在_compile中（模式，标志） 283返回模式第284章真相（四更） -> 285提高TypeError（“第一个参数必须是字符串或已编译模式”）第286章（二更） 287（如果不是）（标志和调试）：

TypeError：第一个参数必须为字符串或编译后的模式

Answer 1

pd.Series.str.contains不接受字符串元组作为第一个参数（https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html#pandas-series-str-contains）。但是，您尝试在以下两行中执行此操作：

~df_ipp_h_simple['PRODUCT'].str.contains(("1611","1612","1635")) 
~df_ipp_h_simple['CUSTOMER'].str.contains(("POPEYES","CHECKERS","KRYSTAL"))

该错误告诉您它需要一个字符串或一个已编译的正则表达式。

您应使用正则表达式进行具有多种模式的搜索，例如：https://stackoverflow.com/a/26577689/9144990

如何在多个值的熊猫中过滤单个列（dtype = object）

1 个答案: