代码

Question

我有一个嵌套的字符串列表，语料库由不同长度的列表组成。我想只保留长度大于2的字符串。

根据how to remove an element from a nested list?中的类似问题，我尝试了所有答案，这些答案允许我指出条件长度＆gt; 2.

代码

corpus = list(r_corpus('teeny.txt'))
print('initial corpus here ',corpus)

#Current attempt
[[ subelt for subelt in elt if len(subelt) >2 ] for elt in corpus] 

#previous attempt 1
##for thing in corpus:
##    [y for y in thing if len(y)>2]

#previous attempt 2
##for sentence in corpus:
##    sentence = [x for x in sentence if len(x) > 2 ]

print('\n\n corpus here without any string of length 2 or smaller',corpus)

这是当前尝试的输出，对于之前的两次尝试是相同的。

初始语料库此处

[['extracting', 'opinions'],
['soo', 'min', 'kim', 'and'],
['abstract'],
['this', 'paper', 'presents', 'method', 'for', 'identifying', 'an'], 
['this', 'section', 'reviews', 'previous', 'works', 'in'], 
['subjectivity', 'detection', 'is'], 
['work', 'is', 'similar', 'to', 'ours', 'but', 'different']]

语料库，任何长度为2或更小的字符串

[['extracting', 'opinions'],
['soo', 'min', 'kim', 'and'], 
['abstract'], 
['this', 'paper', 'presents', 'method', 'for', 'identifying', 'an'], 
['this', 'section', 'reviews', 'previous', 'works', 'in'], 
['subjectivity', 'detection', 'is'], 
['work', 'is', 'similar', 'to', 'ours', 'but', 'different']]

我需要什么

使用第二版语料库而不使用任何长度为2或更小的字符串的最快方法：

语料库，不包含任何长度为2或更小的字符串

[['extracting', 'opinions'], 
['soo', 'min', 'kim', 'and'], 
['abstract'], 
['this', 'paper', 'presents', 'method', 'for', 'identifying'], 
['this', 'section', 'reviews', 'previous', 'works'],
['subjectivity', 'detection'],
['work','similar','ours', 'but', 'different']]

感谢。

Answer 1

@Vera ，您可以尝试以下代码。它使用列表理解， lambda函数， map（），过滤器等概念。

使用列表理解， lambda函数， map（），过滤器（）， reduce（）等是一种以简单，高效和简洁的方式解决问题的Pythonic方法。

您可以查看List comprehension和map(), filter(), reduce(), lambda function等，查看与这些概念相关的给定示例并说明。

import json

corpus = [['extracting', 'opinions'],
['soo', 'min', 'kim', 'and'],
['abstract'],
['this', 'paper', 'presents', 'method', 'for', 'identifying', 'an'], 
['this', 'section', 'reviews', 'previous', 'works', 'in'], 
['subjectivity', 'detection', 'is'], 
['work', 'is', 'similar', 'to', 'ours', 'but', 'different']]

new_corpus = list( map(lambda words: list(filter(lambda word: len(word)> 2, words)), corpus))

# Pretty printing list of lists of words of length > 2
print(json.dumps(new_corpus, indent=2))

"""
[
  [
    "extracting",
    "opinions"
  ],
  [
    "soo",
    "min",
    "kim",
    "and"
  ],
  [
    "abstract"
  ],
  [
    "this",
    "paper",
    "presents",
    "method",
    "for",
    "identifying"
  ],
 [
    "this",
    "section",
    "reviews",
    "previous",
    "works"
  ],
  [
    "subjectivity",
    "detection"
  ],
  [
    "work",
    "similar",
    "ours",
    "but",
    "different"
  ]
]
"""

如何从嵌套列表中删除一定长度的字符串？

代码

我需要什么

1 个答案: