此pandas
代码搜索DataFrame列中每个单元格的r"\d+X|X\d+"
。
如果找到"X"
,则会将其更改为"x"
。
match = re.compile(r"\d+X|X\d+", flags=re.IGNORECASE)
def f(value):
f2 = lambda x: match.findall(x)[0] if len(match.findall(x)) > 0 else ""
leverage = f2(value)
if leverage[0].replace("X","x") == "x":
leverage = "".join(leverage[1:])+leverage[0].replace("X","x")
#Do other stuff here for var
return var
df["description"] = df["name"].map(lambda x:f(x))
问题:如果在"x"
列的单元格中找不到"X"
或"name"
,则会出错:
if leverage[0].replace("X","x") == "x":
IndexError: string index out of range
如何解决不包含任何这些字符的字符串的问题?
示例DataFrame:
import pandas as pd
import re
df = pd.DataFrame(["LONG APPLE X5 C", "SHORT APPLE C"], columns=["name"])
答案 0 :(得分:1)
在调用你的函数之前,只需先使用contains
过滤df:
df["description"] = df.loc[df['name'].str.contains('x', case=False), 'name'].map(lambda x:f(x))
所以掩码会返回:
In [17]:
df.loc[df['name'].str.contains('x', case=False), 'name']
Out[17]:
0 LONG APPLE X5 C
Name: name, dtype: object
如果您不想掩盖自己的df,可以在func中添加一个检查:
def f(value):
if 'x' not in value.lower():
print('not in')
# do whatever you want here