删除所有但特定的字符串序列

时间:2017-06-06 18:01:22

标签: python regex string list

我有一个字符串列表。每个字符串都包含我需要的特定字符序列 - 我正在寻找三到四个确切的序列 - 其余的包含需要从字符串中消除的不可预测的数据。即: 序列= [' sequenceA',' sequenceB',' sequenceC'] bigList = ['垃圾序列B blahblah',' sequenceA废话',' silliness sequenceC','总废话'] goalList = [' sequenceB',' sequenceA',' sequenceC',''] 我可以使用sub或.replace来删除特定字符,但这是反转:我需要删除所有特定字符串的所有内容,对于没有特定序列实例的列表元素,我仍然需要保留元素列表保持有序。我仍然是正则表达式的新手 - 有没有办法做到这一点我还没有找到?

3 个答案:

答案 0 :(得分:0)

试试这个:

   goalList = ['' for x in range(len(bigList)]
   for elem in bigList:
       if sequenceA in bigList[elem]:
           goalList[elem] = sequenceA
       if sequenceB in bigList[elem]:
           goalList[elem] = sequenceB
       if sequenceA in bigList[elem]:
           goalList[elem] = sequenceC

当然,让它适应您在数据库中拥有的东西。

答案 1 :(得分:0)

你可以用一个简单的双循环来做到这一点:

sequences = ['sequenceA', 'sequenceB', 'sequenceC']
bigList = ['Garbage sequenceB blahblah', 'sequenceA nonsense', 'silliness sequenceC', 'total nonsense']

goalList = []
for element in bigList:
    for seq in sequences:
        if seq in element:
            break
    goalList.append(seq if seq in element else "")

print(goalList)
# prints: ['sequenceB', 'sequenceA', 'sequenceC', '']

答案 2 :(得分:0)

如果你喜欢神奇的单行。

>>>[ ''.join([x if x in y else '' for x in sequences]) for y in bigList]

['sequenceB', 'sequenceA', 'sequenceC', '']