蟒蛇。使用单词

时间:2016-03-13 22:43:54

标签: python string list split

我有一个单词列表。

trails = ("Fire trail", "Firetrail", "Fire Trail", "FT", "firetrail")

我需要根据这些词中的任何一个来拆分另一个字符串 因此,如果要检查的名称是:

  • 贫困点FT
  • Cedar Party Fire Trails
  • 邮箱路径
  • Carpet Snake Creek Firetrail
  • Pretty Gully firetrail - Roayl NP

我想修改它们看起来像这样:

  • 贫困点
  • Cedar Party
  • 邮箱
  • Carpet Snake Creek
  • Pretty Gully

在跟踪列表中的一个单词之前拆分并仅复制之前的部分。

谢谢!

我应该补充一下,我的代码以:

开头
for f in arcpy.da.SearchCursor("firetrail_O_noD_Layer", "FireTrailName", None, None):
...     if any(var in str(f[0]) for var in trail):
...         new_field = *that part of string without any fire trails and anything after it*

str(f [0])指的是第一个列表中的名称 new_field引用我在第二个列表中的名称,我需要创建

5 个答案:

答案 0 :(得分:1)

我相信这就是你要找的东西。如果您希望它不区分大小写,也可以像re.IGNORECASE那样添加标记res = re.split(regex, s, re.IGNORECASE)。有关详细说明,请参阅re.split()

import re
trails = ("Fire trail", "Firetrail", "Fire Trail", "FT", "firetrail")

# \b means word boundaries.
regex = r"\b(?:{})\b".format("|".join(trails))

s = """Poverty Point FT
Cedar Party Fire Trails
Mailbox Trail
Carpet Snake Creek Firetrail
Pretty Gully firetrail - Roayl NP"""

res = re.split(regex, s)

<强>更新

如果你逐行去,并且不希望结束,你可以这样做:

import re
trails = ("Fire trail", "Firetrail", "Fire Trail", "FT", "firetrail", "Trail", "Trails")

# \b means word boundaries.
regex = r"\b(?:{}).*".format("|".join(trails))

s = """Poverty Point FT
Cedar Party Fire Trails
Mailbox Trail
Carpet Snake Creek Firetrail
Pretty Gully firetrail - Roayl NP"""

res = [r.strip() for r in re.split(regex, s)]

答案 1 :(得分:1)

您可以在此处使用re.split

import re

_list = re.split(r'Fire trail|Firetrail|Fire Trail|FT|firetrail', _string)

答案 2 :(得分:1)

看来,要求和解决方案应该反复澄清和测试,我在这里提供 建议的解决方案与pytest一起使用的测试套件。

首先,创建test_trails.py文件:

import pytest


def fix_trails(trails):
    """Clean up list of trails to make sure, longest phrases are processed
    with highest priority (are sooner in the list).

    This is needed, if some trail phrases contain other ones.
    """
    trails.sort(key=len, reverse=True)
    return trails


@pytest.fixture
def trails():
    phrases = ["Fire trail", "Firetrail", "Fire Trail",
               "FT", "firetrail", "Trail", "Fire Trails"]
    return fix_trails(phrases)


def remove_trails(line, trails):
    for trail in trails:
        if trail in line:
            res = line.replace(trail, "").strip()
            return res.replace("  ", " ")
    return line


scenarios = [
    ["Poverty Point FT", "Poverty Point"],
    ["Cedar Party Fire Trails", "Cedar Party Fire"],
    ["Mailbox Trail", "Mailbox"],
    ["Carpet Snake Creek Firetrail", "Carpet Snake Creek"],
    ["Pretty Gully firetrail - Roayl NP", "Pretty Gully - Roayl NP"],
]


@pytest.mark.parametrize("scenario", scenarios, ids=lambda itm: itm[0])
def test(scenario, trails):
    line, expected = scenario
    result = remove_trails(line, trails)
    assert result == expected

该文件定义了从处理过的行中删除不需要的文本以及它包含的功能 测试用例test_trails

要测试它,请安装pytest

$ pip install pytest

然后运行测试:

$ py.test -sv test_trails.py
========================================= test session starts ==================================
=======
platform linux2 -- Python 2.7.9, pytest-2.8.7, py-1.4.31, pluggy-0.3.1 -- /home/javl/.virtualenvs/stack
/bin/python2
cachedir: .cache
rootdir: /home/javl/sandbox/stack, inifile:
collected 5 items

test_trails.py::test[Poverty Point FT] PASSED
test_trails.py::test[Cedar Party Fire Trails] FAILED
test_trails.py::test[Mailbox Trail] PASSED
test_trails.py::test[Carpet Snake Creek Firetrail] PASSED
test_trails.py::test[Pretty Gully firetrail - Roayl NP] PASSED

================ FAILURES ==================
______ test[Cedar Party Fire Trails] _______

scenario = ['Cedar Party Fire Trails', 'Cedar Party Fire']
trails = ['Fire Trails', 'Fire trail', 'Fire Trail', 'Firetrail', 'firetrail', 'Trail', ...]

    @pytest.mark.parametrize("scenario", scenarios, ids=lambda itm: itm[0])
    def test(scenario, trails):
        line, expected = scenario
        result = remove_trails(line, trails)
>       assert result == expected
E       assert 'Cedar Party' == 'Cedar Party Fire'
E         - Cedar Party
E         + Cedar Party Fire
E         ?            +++++

test_trails.py:42: AssertionError
======== 1 failed, 4 passed in 0.01 seconds ============

py.test命令在文件中发现测试用例,查找输入参数,使用注入 将trails的值放入其中,测试用例的参数化提供了方案 参数。

然后,您可以微调功能remove_trailstrails列表,直到所有通行证。

完成后,您可以将remove_trails功能移动到您需要的位置(可能包括 trails列表)。

您可以使用此方法测试针对您的问题提出的任何解决方案。

答案 3 :(得分:1)

嗯,这是执行任务的更动态的方法

import re

courses = r"""
Poverty Point FT
Cedar Party Fire Trails
Mailbox Trail
Carpet Snake Creek Firetrail
Pretty Gully firetrail - Roayl NP
"""

trails = ("Fire trail", "Firetrail", "Fire Trail", "FT", "firetrail")

rx_str = '|'.join(trails)
rx_str = r"^.+?(?=(?:{0}|$))".format(rx_str)

rx = re.compile(rx_str, re.IGNORECASE | re.MULTILINE)

for course in rx.finditer(courses):
    print(course.group())

正如您所注意到的,我正在动态地将列表转换为正则表达式,而不需要硬编码。脚本将呈现以下结果:

Poverty Point 
Cedar Party 
Mailbox Trail
Carpet Snake Creek 
Pretty Gully 

答案 4 :(得分:0)

您可以使用正则表达式执行此操作,例如:

def make_matcher(trails):
    import re
    rgx = re.compile(r"{}".format("|".join(trails)))
    return lambda txt: rgx.split(txt)[0]

>>> m = make_matcher(["Fire trail", "Firetrail", "Fire Trail", "FT", "firetrail"])
>>> examples = ["Poverty Point FT", "Cedar Party Fire Trails", "Mailbox Trail", "Carpet Snake Creek Firetrail", "Pretty Gully firetrail - Roayl NP"]
>>> for x in examples:
...     print(m(x))
Poverty Point 
Cedar Party 
Mailbox Trail
Carpet Snake Creek 
Pretty Gully 

请注意,在此示例中,保留了例如Firetrail出现之前的尾随空格。那可能不是你想要的。