我有以下文字,每行有两个短语,并以"\t"
RoadTunnel RouteOfTransportation
LaunchPad Infrastructure
CyclingLeague SportsLeague
Territory PopulatedPlace
CurlingLeague SportsLeague
GatedCommunity PopulatedPlace
我想要的是将_
添加到单独的单词中,结果应为:
Road_Tunnel Route_Of_Transportation
Launch_Pad Infrastructure
Cycling_League Sports_League
Territory Populated_Place
Curling_League Sports_League
Gated_Community Populated_Place
没有"ABTest"
或"aBTest"
这样的情况,并且有三个单词在一起的情况"RouteOfTransportation"
我尝试了几种方法但没有成功。
我的一个尝试是:
textProcessed = re.sub(r"([A-Z][a-z]+)(?=([A-Z][a-z]+))", r"\1_", text)
但是没有结果
答案 0 :(得分:4)
使用正则表达式和re.sub
。
>>> import re
>>> s = '''LaunchPad Infrastructure
... CyclingLeague SportsLeague
... Territory PopulatedPlace
... CurlingLeague SportsLeague
... GatedCommunity PopulatedPlace'''
>>> subbed = re.sub('([A-Z][a-z]+)([A-Z])', r'\1_\2', s)
>>> print(subbed)
Launch_Pad Infrastructure
Cycling_League Sports_League
Territory Populated_Place
Curling_League Sports_League
Gated_Community Populated_Place
编辑:这是另一个,因为您的测试用例不足以确定您想要的内容:
>>> re.sub('([a-zA-Z])([A-Z])([a-z])', r'\1_\2\3', 'ABThingThing')
'AB_Thing_Thing'
答案 1 :(得分:2)
合并re.findall
和str.join
:
>>> "_".join(re.findall(r"[A-Z]{1}[^A-Z]*", text))
答案 2 :(得分:2)
根据您的需要,可以采用略有不同的解决方案:
import re
result = re.sub(r"([a-zA-Z])(?=[A-Z])", r"\1_", s)
它将在跟随另一个字母的任何大写字母之前插入_
(无论是大写还是小写)。
"TheRabbit IsBlue"
=> "The_Rabbit Is_Blue"
"ABThing ThingAB"
=> "A_B_Thing Thing_A_B"
它不支持特殊字符。