我有一个数据框,该数据框在单列中包含以下行:
__label__JCB_Spare_Part __label__Differential_Housings jcb casting assy differential housing
__label__Vibrating_Roller __label__Road_Roller double drum mini roller seat drive model fyl engine nbsp hp aircolled diesel engine wheel size walk speed km climbing capacity drive hydrostatic drive nbsp nbsp
__label__Vibrating_Roller __label__Road_Roller double drum mini roller seat drive model fyl engine nbsp hp aircolled diesel engine wheel size walk speed km climbing capacity drive hydrostatic drive nbsp nbsp
__label__Crawler_Dozer __label__Bulldozer dozer bulldozer
__label__Crawler_Dozer __label__Bulldozer dozer bulldozer
我希望在单独的列中提取所有前缀等于__label__
的单词,如下所示:
__label__JCB_Spare_Part __label__Differential_Housings
__label__Vibrating_Roller __label__Road_Roller
__label__Vibrating_Roller __label__Road_Roller
__label__Crawler_Dozer __label__Bulldozer
__label__Crawler_Dozer __label__Bulldozer
我尝试过的方法:
labels = input[0].str.extract(r'(__label__[\w]+)')
但它只会拉出一个第一个标签。
答案 0 :(得分:1)
您的代码大部分是正确的;只是您要Google Sheets API has not been used in project ### before or it is disabled.
代替
findall
答案 1 :(得分:0)
您可以尝试以下方法:
import re
str = """
__label__JCB_Spare_Part __label__Differential_Housings jcb casting assy differential housing
__label__Vibrating_Roller __label__Road_Roller double drum mini roller seat drive model fyl engine nbsp hp aircolled diesel engine wheel size walk speed km climbing capacity drive hydrostatic drive nbsp nbsp
__label__Vibrating_Roller __label__Road_Roller double drum mini roller seat drive model fyl engine nbsp hp aircolled diesel engine wheel size walk speed km climbing capacity drive hydrostatic drive nbsp nbsp
__label__Crawler_Dozer __label__Bulldozer dozer bulldozer
__label__Crawler_Dozer __label__Bulldozer dozer bulldozer
"""
result = re.findall('__label__\w+', str)