如何从嵌套列表中删除一定长度的字符串?

时间:2018-05-26 18:54:03

标签: string python-3.x nested nested-lists

我有一个嵌套的字符串列表,语料库由不同长度的列表组成。我想只保留长度大于2的字符串。

根据how to remove an element from a nested list?中的类似问题,我尝试了所有答案,这些答案允许我指出条件长度> 2.

代码

corpus = list(r_corpus('teeny.txt'))
print('initial corpus here ',corpus)

#Current attempt
[[ subelt for subelt in elt if len(subelt) >2 ] for elt in corpus] 

#previous attempt 1
##for thing in corpus:
##    [y for y in thing if len(y)>2]

#previous attempt 2
##for sentence in corpus:
##    sentence = [x for x in sentence if len(x) > 2 ]

print('\n\n corpus here without any string of length 2 or smaller',corpus)

这是当前尝试的输出,对于之前的两次尝试是相同的。

初始语料库此处

[['extracting', 'opinions'],
['soo', 'min', 'kim', 'and'],
['abstract'],
['this', 'paper', 'presents', 'method', 'for', 'identifying', 'an'], 
['this', 'section', 'reviews', 'previous', 'works', 'in'], 
['subjectivity', 'detection', 'is'], 
['work', 'is', 'similar', 'to', 'ours', 'but', 'different']]

语料库,任何长度为2或更小的字符串

[['extracting', 'opinions'],
['soo', 'min', 'kim', 'and'], 
['abstract'], 
['this', 'paper', 'presents', 'method', 'for', 'identifying', 'an'], 
['this', 'section', 'reviews', 'previous', 'works', 'in'], 
['subjectivity', 'detection', 'is'], 
['work', 'is', 'similar', 'to', 'ours', 'but', 'different']]

我需要什么

使用第二版语料库而不使用任何长度为2或更小的字符串的最快方法:

语料库,不包含任何长度为2或更小的字符串

[['extracting', 'opinions'], 
['soo', 'min', 'kim', 'and'], 
['abstract'], 
['this', 'paper', 'presents', 'method', 'for', 'identifying'], 
['this', 'section', 'reviews', 'previous', 'works'],
['subjectivity', 'detection'],
['work','similar','ours', 'but', 'different']]

感谢。

1 个答案:

答案 0 :(得分:0)

@Vera ,您可以尝试以下代码。它使用列表理解 lambda函数 map()过滤器等概念。

  

使用列表理解 lambda函数 map()过滤器() reduce()等是一种以简单,高效和简洁的方式解决问题的Pythonic方法。

     

您可以查看List comprehensionmap(), filter(), reduce(), lambda function等,查看与这些概念相关的给定示例并说明。

import json

corpus = [['extracting', 'opinions'],
['soo', 'min', 'kim', 'and'],
['abstract'],
['this', 'paper', 'presents', 'method', 'for', 'identifying', 'an'], 
['this', 'section', 'reviews', 'previous', 'works', 'in'], 
['subjectivity', 'detection', 'is'], 
['work', 'is', 'similar', 'to', 'ours', 'but', 'different']]

new_corpus = list( map(lambda words: list(filter(lambda word: len(word)> 2, words)), corpus))

# Pretty printing list of lists of words of length > 2
print(json.dumps(new_corpus, indent=2))

"""
[
  [
    "extracting",
    "opinions"
  ],
  [
    "soo",
    "min",
    "kim",
    "and"
  ],
  [
    "abstract"
  ],
  [
    "this",
    "paper",
    "presents",
    "method",
    "for",
    "identifying"
  ],
 [
    "this",
    "section",
    "reviews",
    "previous",
    "works"
  ],
  [
    "subjectivity",
    "detection"
  ],
  [
    "work",
    "similar",
    "ours",
    "but",
    "different"
  ]
]
"""