我有一个检查结果和违规数据框架,如下所示:
Results Violations
Pass w/ Conditions 3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E
Pass 36. THERMOMETERS PROVIDED & ACCURATE Comment...
我需要做的是让python循环遍历此pandas数据框,尤其是在“违规”列中,并确定 “以数字开头,以注释结尾:”
我能够使用正则表达式通过此行代码去除数字
df_new['Violations'] = df_new['Violations'].map(lambda x:
x.lstrip('0123456789.- ').rstrip('[^a-zA-Z]Comments[^a-zA-Z]'))
如您所见,我试图通过rstrip regex命令来实现注释结束,但这似乎无济于事。输出看起来像这样
Results Violations
0 Pass w/ Conditions MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL EMPL...
1 Pass THERMOMETERS PROVIDED & ACCURATE - Comments: 4...
regex命令的基本含义是:查找数字并删除数字和注释之间的所有内容:
有一种简单的方法吗?
答案 0 :(得分:0)
regex命令的基本含义是:查找数字并删除数字和注释之间的所有内容:
gsutil defstorageclass set regional gs://[BUCKET_NAME]
foo = '''\
Results Violations
Pass w/ Conditions 3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E
Pass 36. THERMOMETERS PROVIDED & ACCURATE Comment...'''
>>> print(foo)
Results Violations
Pass w/ Conditions 3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E
Pass 36. THERMOMETERS PROVIDED & ACCURATE Comment...
>>>
import re
bar = re.sub('(\d+\.).*(Comment.*)', '\\1', foo)
参考:
字符串中子字符串的最后一次出现