Pandas re.compile函数 - IndexError:字符串索引超出范围

时间:2015-07-03 15:22:13

标签: python pandas

pandas代码搜索DataFrame列中每个单元格的r"\d+X|X\d+"。 如果找到"X",则会将其更改为"x"

match = re.compile(r"\d+X|X\d+", flags=re.IGNORECASE)

def f(value):
    f2 = lambda x: match.findall(x)[0] if len(match.findall(x)) > 0 else ""

    leverage = f2(value)

    if leverage[0].replace("X","x") == "x":
        leverage = "".join(leverage[1:])+leverage[0].replace("X","x")

    #Do other stuff here for var
    return var

df["description"] = df["name"].map(lambda x:f(x))

问题:如果在"x"列的单元格中找不到"X""name",则会出错:

if leverage[0].replace("X","x") == "x":
IndexError: string index out of range

如何解决不包含任何这些字符的字符串的问题?

示例DataFrame:

import pandas as pd
import re

df = pd.DataFrame(["LONG APPLE X5 C", "SHORT APPLE C"], columns=["name"])

1 个答案:

答案 0 :(得分:1)

在调用你的函数之前,只需先使用contains过滤df:

df["description"] = df.loc[df['name'].str.contains('x', case=False), 'name'].map(lambda x:f(x))

所以掩码会返回:

In [17]:
df.loc[df['name'].str.contains('x', case=False), 'name']

Out[17]:
0    LONG APPLE X5 C
Name: name, dtype: object

如果您不想掩盖自己的df,可以在func中添加一个检查:

def f(value):
    if 'x' not in value.lower():
        print('not in')
        # do whatever you want here