从列表中提取出发和到达

时间:2018-05-08 09:19:33

标签: python list identification

我试图从结构和长度可变的列表中提取一些参数。基本上,这些参数是路线的出发地和到达地址。此列表是使用自然语言的句子构建的,因此它不遵循任何特定模板:

1st example : ['go', 'Buzenval', 'from', 'Chatelet']
2nd example : ['How', 'go', 'street', 'Saint', 'Augustin', 'from', 'Buzenval']
3rd example : ['go', 'from', '33', 'street', 'Republique', 'to', '12','street','Napoleon']

我已经设法为每种情况创建了一个非常相似的列表,除了出发和到达被实际的单词'离开'并且'到达'。通过以上示例,我获得了:

1st example : ['go', 'arrival', 'from', 'departure']
2nd example : ['How', 'go', 'arrival', 'from', 'departure']
3rd example : ['go', 'from', 'departure', 'to', 'arrival']

现在我有这两种列表,我想确定出发和到达:

1rst example : departure = ['Chatelet'], arrival = ['Buzenval']
2nd example : departure =  ['Buzenval'], arrival = ['street','Saint','Augustin']
3rd example : departure = ['33','street','Republique'], arrival = ['12','street','Napoleon']

基本上,参数是两个列表中不同的一切,但我需要确定哪一个是出发点,哪一个是到达点。我认为Regex可以帮助我,但我不知道如何。

感谢您的帮助!

4 个答案:

答案 0 :(得分:1)

Regex肯定会对此有所帮助,但我尝试了一种简单的方法。如果您提到的模式适用于所有模式,则适用。我正在为第一个例子展示它。您可以为其余部分应用相同的逻辑并修改代码:

代码:

<div id="1234">
     My name is ABC and my age is &lt; 30. 
</div>

输出:

first = ['go', 'Buzenval', 'from', 'Chatelet'] # First Example
start = first.index('go')
end = first.index('from')
arrival = base[start+1:end]
departure = base[end+1:]
print("Departure: {0} , Arrival: {1}".format(departure,arrival))

答案 1 :(得分:1)

我找到了解决你的三个例子的方法。您应该更改的一件事是变量名称,我不知道如何命名它们。 (这是旧的缓慢且难以理解的版本。后者是更好的版本)

def extract_places(names, modes):
    keywords = set(modes).intersection(names)
    extracted = [[] for _ in modes]
    j = 0
    for i, mode in enumerate(modes):
        if mode.lower() in keywords:
            if mode.lower() != names[j].lower():
                while mode.lower() != names[j].lower():
                    extracted[i - 1].append(names[j])
                    j += 1
            else:
                extracted[i].append(names[j])
                j += 1
        else:
            if names[j].lower() not in keywords:
                while j < len(names) and names[j].lower() not in keywords:
                    extracted[i].append(names[j])
                    j += 1

    extracted = dict(zip(modes, extracted))
    return extracted["arrival"], extracted["departure"]

我找到了另一种方法,这可能更容易理解。但这种方式比第一种快十倍,所以你可能想要使用它。

def partition(l, word): # Helper to split a list or tuple at an specific element
    i = l.index(word)
    return l[:i], l[i + 1:]

def extract_places(names, modes):
    keywords = set(modes).intersection(names)
    mapped = [(modes, names)]
    for word in keywords:
        new_mapped = []
        for mode,name in mapped:
            if word in mode:
                m1, m2 = partition(mode, word)
                n1, n2 = partition(name, word)
                if m1:
                    new_mapped.append((m1, n1))
                if m2:
                    new_mapped.append((m2, n2))
            else:
                new_mapped.append((mode,name))
        mapped = new_mapped
    mapped = {m[0]: n for m, n in mapped}
    return mapped['arrival'], mapped['departure']

两种方式都完全相同:

for example in ((['go', 'Buzenval', 'from', 'Chatelet'],
                 ['go', 'arrival', 'from', 'departure']
                 ),
                (['How', 'go', 'street', 'Saint', 'Augustin', 'from', 'Buzenval'],
                 ['How', 'go', 'arrival', 'from', 'departure']
                 ),
                (['go', 'from', '33', 'street', 'Republique', 'to', '12', 'street', 'Napoleon'],
                 ['go', 'from', 'departure', 'to', 'arrival']
                 )):
    print(extract_places(*example))

打印两者:

(['Buzenval'], ['Chatelet'])
(['street', 'Saint', 'Augustin'], ['Buzenval'])
(['12', 'street', 'Napoleon'], ['33', 'street', 'Republique'])

答案 2 :(得分:1)

Python解释器的示例:

>>> import itertools
>>> key = None
>>> arr = ['go', 'from', '33', 'street', 'Republique', 'to', '12','street','Napoleon']
>>>
>>> for k, group in itertools.groupby(arr, lambda x: x in ['go', 'to','from']):
...     if k:
...         key = list(group)[-1]
...         continue
...     if key is not None:
...         if key == 'from':
...             tag = 'departure'
...         else:
...             tag = 'arrival'
...         print tag, list(group)
...     key = None
...
departure ['33', 'street', 'Republique']
arrival ['12', 'street', 'Napoleon']

答案 3 :(得分:1)

这应该适合你:

l1 =  ['go', 'Buzenval', 'from', 'Chatelet']
l2 =  ['How', 'go', 'street', 'Saint', 'Augustin', 'from', 'Buzenval']
l3 =  ['go', 'from', '33', 'street', 'Republique', 'to', '12','street','Napoleon']

def get_locations (inlist):
    marker = 0
    end_dep = 0
    start_dep = 0

    for word in inlist:
        if word =="go":
            if inlist[marker+1] != "from":
                end_dep = marker +1
            else:
                start_dep = marker +2

        if word =="from" and start_dep == 0:
            start_dep = marker + 1

        if word == "to":
            end_dep = marker + 1
        marker +=1

    if end_dep > start_dep:
        start_loc = inlist[start_dep:end_dep-1]
        end_loc = inlist[end_dep:]

    else:
        start_loc = inlist [start_dep:]
        end_loc = inlist[end_dep: start_dep -1]

    return start_loc, end_loc

directions = get_locations (l3) #change to l1 / l2 to see other outputs

print( "departure = " + str(directions[0]))
print( "arrival = " + str(directions[1]))