Question

我有以下文字，每行有两个短语，并以"\t"

分隔

RoadTunnel    RouteOfTransportation
LaunchPad   Infrastructure
CyclingLeague   SportsLeague
Territory   PopulatedPlace
CurlingLeague   SportsLeague
GatedCommunity  PopulatedPlace

我想要的是将_添加到单独的单词中，结果应为：

Road_Tunnel    Route_Of_Transportation
Launch_Pad  Infrastructure
Cycling_League  Sports_League
Territory   Populated_Place
Curling_League  Sports_League
Gated_Community Populated_Place

没有"ABTest"或"aBTest"这样的情况，并且有三个单词在一起的情况"RouteOfTransportation"我尝试了几种方法但没有成功。

我的一个尝试是：

textProcessed = re.sub(r"([A-Z][a-z]+)(?=([A-Z][a-z]+))", r"\1_", text)

但是没有结果

Answer 1

使用正则表达式和re.sub。

>>> import re
>>> s = '''LaunchPad   Infrastructure
... CyclingLeague   SportsLeague
... Territory   PopulatedPlace
... CurlingLeague   SportsLeague
... GatedCommunity  PopulatedPlace'''
>>> subbed = re.sub('([A-Z][a-z]+)([A-Z])', r'\1_\2', s)
>>> print(subbed)
Launch_Pad   Infrastructure
Cycling_League   Sports_League
Territory   Populated_Place
Curling_League   Sports_League
Gated_Community  Populated_Place

编辑：这是另一个，因为您的测试用例不足以确定您想要的内容：

>>> re.sub('([a-zA-Z])([A-Z])([a-z])', r'\1_\2\3', 'ABThingThing')
'AB_Thing_Thing'

Answer 2

合并re.findall和str.join：

>>> "_".join(re.findall(r"[A-Z]{1}[^A-Z]*", text))

Answer 3

根据您的需要，可以采用略有不同的解决方案：

import re
result = re.sub(r"([a-zA-Z])(?=[A-Z])", r"\1_", s)

它将在跟随另一个字母的任何大写字母之前插入_（无论是大写还是小写）。

"TheRabbit IsBlue" =＆gt; "The_Rabbit Is_Blue"
"ABThing ThingAB" =＆gt; "A_B_Thing Thing_A_B"

它不支持特殊字符。

如何根据某些规则更改字符串？

3 个答案: