提取数据框中的一部分字符串

时间:2019-03-22 05:23:07

标签: python regex pandas

我有一个数据框,该数据框在单列中包含以下行:

__label__JCB_Spare_Part __label__Differential_Housings jcb casting assy differential housing __label__Vibrating_Roller __label__Road_Roller double drum mini roller seat drive model fyl engine nbsp hp aircolled diesel engine wheel size walk speed km climbing capacity drive hydrostatic drive nbsp nbsp __label__Vibrating_Roller __label__Road_Roller double drum mini roller seat drive model fyl engine nbsp hp aircolled diesel engine wheel size walk speed km climbing capacity drive hydrostatic drive nbsp nbsp __label__Crawler_Dozer __label__Bulldozer dozer bulldozer __label__Crawler_Dozer __label__Bulldozer dozer bulldozer

我希望在单独的列中提取所有前缀等于__label__的单词,如下所示: __label__JCB_Spare_Part __label__Differential_Housings __label__Vibrating_Roller __label__Road_Roller __label__Vibrating_Roller __label__Road_Roller __label__Crawler_Dozer __label__Bulldozer __label__Crawler_Dozer __label__Bulldozer

我尝试过的方法: labels = input[0].str.extract(r'(__label__[\w]+)') 但它只会拉出一个第一个标签。

2 个答案:

答案 0 :(得分:1)

您的代码大部分是正确的;只是您要Google Sheets API has not been used in project ### before or it is disabled.代替

findall

答案 1 :(得分:0)

您可以尝试以下方法:

import re

str = """
__label__JCB_Spare_Part  __label__Differential_Housings jcb  casting  assy  differential  housing
__label__Vibrating_Roller  __label__Road_Roller double  drum  mini  roller  seat  drive  model  fyl  engine  nbsp  hp  aircolled  diesel  engine  wheel  size  walk  speed  km  climbing  capacity  drive  hydrostatic  drive  nbsp  nbsp
__label__Vibrating_Roller  __label__Road_Roller double  drum  mini  roller  seat  drive  model  fyl  engine  nbsp  hp  aircolled  diesel  engine  wheel  size  walk  speed  km  climbing  capacity  drive  hydrostatic  drive  nbsp  nbsp
__label__Crawler_Dozer  __label__Bulldozer dozer  bulldozer
__label__Crawler_Dozer  __label__Bulldozer dozer  bulldozer
"""

result = re.findall('__label__\w+', str)