我有一个单词列表。
trails = ("Fire trail", "Firetrail", "Fire Trail", "FT", "firetrail")
我需要根据这些词中的任何一个来拆分另一个字符串 因此,如果要检查的名称是:
我想修改它们看起来像这样:
在跟踪列表中的一个单词之前拆分并仅复制之前的部分。
谢谢!
我应该补充一下,我的代码以:
开头for f in arcpy.da.SearchCursor("firetrail_O_noD_Layer", "FireTrailName", None, None):
... if any(var in str(f[0]) for var in trail):
... new_field = *that part of string without any fire trails and anything after it*
str(f [0])指的是第一个列表中的名称 new_field引用我在第二个列表中的名称,我需要创建
答案 0 :(得分:1)
我相信这就是你要找的东西。如果您希望它不区分大小写,也可以像re.IGNORECASE
那样添加标记res = re.split(regex, s, re.IGNORECASE)
。有关详细说明,请参阅re.split()
。
import re
trails = ("Fire trail", "Firetrail", "Fire Trail", "FT", "firetrail")
# \b means word boundaries.
regex = r"\b(?:{})\b".format("|".join(trails))
s = """Poverty Point FT
Cedar Party Fire Trails
Mailbox Trail
Carpet Snake Creek Firetrail
Pretty Gully firetrail - Roayl NP"""
res = re.split(regex, s)
<强>更新强>
如果你逐行去,并且不希望结束,你可以这样做:
import re
trails = ("Fire trail", "Firetrail", "Fire Trail", "FT", "firetrail", "Trail", "Trails")
# \b means word boundaries.
regex = r"\b(?:{}).*".format("|".join(trails))
s = """Poverty Point FT
Cedar Party Fire Trails
Mailbox Trail
Carpet Snake Creek Firetrail
Pretty Gully firetrail - Roayl NP"""
res = [r.strip() for r in re.split(regex, s)]
答案 1 :(得分:1)
您可以在此处使用re.split
:
import re
_list = re.split(r'Fire trail|Firetrail|Fire Trail|FT|firetrail', _string)
答案 2 :(得分:1)
看来,要求和解决方案应该反复澄清和测试,我在这里提供
建议的解决方案与pytest
一起使用的测试套件。
首先,创建test_trails.py
文件:
import pytest
def fix_trails(trails):
"""Clean up list of trails to make sure, longest phrases are processed
with highest priority (are sooner in the list).
This is needed, if some trail phrases contain other ones.
"""
trails.sort(key=len, reverse=True)
return trails
@pytest.fixture
def trails():
phrases = ["Fire trail", "Firetrail", "Fire Trail",
"FT", "firetrail", "Trail", "Fire Trails"]
return fix_trails(phrases)
def remove_trails(line, trails):
for trail in trails:
if trail in line:
res = line.replace(trail, "").strip()
return res.replace(" ", " ")
return line
scenarios = [
["Poverty Point FT", "Poverty Point"],
["Cedar Party Fire Trails", "Cedar Party Fire"],
["Mailbox Trail", "Mailbox"],
["Carpet Snake Creek Firetrail", "Carpet Snake Creek"],
["Pretty Gully firetrail - Roayl NP", "Pretty Gully - Roayl NP"],
]
@pytest.mark.parametrize("scenario", scenarios, ids=lambda itm: itm[0])
def test(scenario, trails):
line, expected = scenario
result = remove_trails(line, trails)
assert result == expected
该文件定义了从处理过的行中删除不需要的文本以及它包含的功能
测试用例test_trails
。
要测试它,请安装pytest
:
$ pip install pytest
然后运行测试:
$ py.test -sv test_trails.py
========================================= test session starts ==================================
=======
platform linux2 -- Python 2.7.9, pytest-2.8.7, py-1.4.31, pluggy-0.3.1 -- /home/javl/.virtualenvs/stack
/bin/python2
cachedir: .cache
rootdir: /home/javl/sandbox/stack, inifile:
collected 5 items
test_trails.py::test[Poverty Point FT] PASSED
test_trails.py::test[Cedar Party Fire Trails] FAILED
test_trails.py::test[Mailbox Trail] PASSED
test_trails.py::test[Carpet Snake Creek Firetrail] PASSED
test_trails.py::test[Pretty Gully firetrail - Roayl NP] PASSED
================ FAILURES ==================
______ test[Cedar Party Fire Trails] _______
scenario = ['Cedar Party Fire Trails', 'Cedar Party Fire']
trails = ['Fire Trails', 'Fire trail', 'Fire Trail', 'Firetrail', 'firetrail', 'Trail', ...]
@pytest.mark.parametrize("scenario", scenarios, ids=lambda itm: itm[0])
def test(scenario, trails):
line, expected = scenario
result = remove_trails(line, trails)
> assert result == expected
E assert 'Cedar Party' == 'Cedar Party Fire'
E - Cedar Party
E + Cedar Party Fire
E ? +++++
test_trails.py:42: AssertionError
======== 1 failed, 4 passed in 0.01 seconds ============
py.test
命令在文件中发现测试用例,查找输入参数,使用注入
将trails
的值放入其中,测试用例的参数化提供了方案
参数。
然后,您可以微调功能remove_trails
和trails
列表,直到所有通行证。
完成后,您可以将remove_trails
功能移动到您需要的位置(可能包括
trails
列表)。
您可以使用此方法测试针对您的问题提出的任何解决方案。
答案 3 :(得分:1)
嗯,这是执行任务的更动态的方法
import re
courses = r"""
Poverty Point FT
Cedar Party Fire Trails
Mailbox Trail
Carpet Snake Creek Firetrail
Pretty Gully firetrail - Roayl NP
"""
trails = ("Fire trail", "Firetrail", "Fire Trail", "FT", "firetrail")
rx_str = '|'.join(trails)
rx_str = r"^.+?(?=(?:{0}|$))".format(rx_str)
rx = re.compile(rx_str, re.IGNORECASE | re.MULTILINE)
for course in rx.finditer(courses):
print(course.group())
正如您所注意到的,我正在动态地将列表转换为正则表达式,而不需要硬编码。脚本将呈现以下结果:
Poverty Point
Cedar Party
Mailbox Trail
Carpet Snake Creek
Pretty Gully
答案 4 :(得分:0)
您可以使用正则表达式执行此操作,例如:
def make_matcher(trails):
import re
rgx = re.compile(r"{}".format("|".join(trails)))
return lambda txt: rgx.split(txt)[0]
>>> m = make_matcher(["Fire trail", "Firetrail", "Fire Trail", "FT", "firetrail"])
>>> examples = ["Poverty Point FT", "Cedar Party Fire Trails", "Mailbox Trail", "Carpet Snake Creek Firetrail", "Pretty Gully firetrail - Roayl NP"]
>>> for x in examples:
... print(m(x))
Poverty Point
Cedar Party
Mailbox Trail
Carpet Snake Creek
Pretty Gully
请注意,在此示例中,保留了例如Firetrail
出现之前的尾随空格。那可能不是你想要的。