Question

我正在尝试在数据框中找到带有*A*的字符串

df = pd.DataFrame({
    "col_1":["AAA","BBB","CCC"],
    "col_2":[4,5,6],
    "col_3":[107,800,300],
    "col_4":[1,3,5]
})

#0  1   2   3
#0  AAA 4   107 1
#1  BBB 5   800 3
#2  CCC 6   300 5

此行显示错误：

df['col_1'].str.match("*A*")

它不起作用，并且出现错误：

line 615, in _parse
source.tell() - here + len(this))
sre_constants.error: nothing to repeat at position 0

我也有以下代码

import pandas as pd
df = pd.DataFrame({
  "col_1":["AAA","BBB","CCC"],
  "col_2":[4,5,6],
  "col_3":[107,800,300],
  "col_4":[1,3,5]
 })

 def findItems(df, findText, colName):
    mask = df[colName].astype('str').str.match(findText) 
    print("\n mask",mask)

 The above code is also not working

Answer 1

这里有一些示例可以回答您的问题。

我认为您可能正在寻找：str.match(".+A.+")，这意味着后面跟着A的任何字符，然后跟着任何字符，即str.contains('A')没有帮助你。

也要对所有评论者进行信用。

import pandas as pd

# Let us create a serie as df['col'] is actually a serie.
s = pd.Series(['A','AA','AAA','aaa'])

print(s.str.contains('A').values)             # [ True  True  True False]
print(s.str.lower().str.contains('a').values) # [ True  True  True  True]
print(s.str.match(".*A.*").values)            # [ True  True  True False]
print(s.str.match(".+A.+").values)            # [False False  True False]
print(s.str.match(".+[Aa].+").values)         # [False False  True  True]

如何使用re在熊猫数据框中使用*查找字符串

1 个答案: