将正则表达式应用于pandas数据框列

时间:2019-06-22 09:42:47

标签: python regex pandas

我正在尝试应用一些我已经编码并可以针对变量运行的正则表达式,但是我想将其应用于dataframe列,然后将结果传递给新列

df["Details"] is my dataframe

df [“ Details”]是我的数据框,其中包含一些文本,类似于我在下面作为详细信息创建的文本

import re
details = '1st: Batman 01:12.98 11.5L'

position = re.search('\w\w\w:\s', details)
distance = re.search('(\s\d\d.[0-9]L)', details)
time = re.search(r'\d{2}:\d{2}.\d{2}',details)

print(position.group(0))
print(distance.group(0))
print(time.group(0))
output is then 
    1st: 
    11.5L
    01:12.98

然后我希望能够将这些值添加到分别与输出匹配的数据帧的新列中,称为位置,距离,时间

2 个答案:

答案 0 :(得分:2)

我相信您需要Series.str.extract

details = '1st: Batman 01:12.98 11.5L'

df = pd.DataFrame({"Details":[details,details,details]})

df['position'] = df['Details'].str.extract(r'(\w\w\w:\s)')
df['distance'] = df['Details'].str.extract(r'(\s\d\d.[0-9]L)')
df['time'] = df['Details'].str.extract(r'(\d{2}:\d{2}.\d{2})')
print(df)

                      Details position distance      time
0  1st: Batman 01:12.98 11.5L    1st:     11.5L  01:12.98
1  1st: Batman 01:12.98 11.5L    1st:     11.5L  01:12.98
2  1st: Batman 01:12.98 11.5L    1st:     11.5L  01:12.98

答案 1 :(得分:0)

在lambda函数中应用提取:

df['position'] = df['Details'].apply(lambda x: str(x).extract(r'(\w\w\w:\s)')))
df['distance'] = df['Details'].apply(lambda x: str(x).extract(r'(\s\d\d.[0-9]L)'))
df['time'] = df['Details'].apply(lambda x: str(x).extract(r'(\d{2}:\d{2}.\d{2})'))