Question

我正在从网页抓取数据，当我将数据加载到列表列表中时，它看起来像这样

[['text', 'text', '', '', 'text', 'text']]

我正在尝试从所有列表中删除空字符串，到目前为止我尝试过的所有内容都不起作用。

results = []
for list in scrape_list:
    for item in scrape_list:
        if item != '':
            results.append(item)



OUTPUT: [['text', 'text', '', '', 'text', 'text']]



scrape_list1 = list(filter(None, scrape_list))
     OUTPUT: [['text', 'text', '', '', 'text', 'text']]``

我想知道这些索引是否实际上不是空字符串并且持有一个值。如果有其他人遇到这个，请随时让我知道发生了什么，因为我无法弄明白。

Answer 1

只是一个错字，我猜（正如@chunjef的评论中提到的那样）：

results = []
for lst in scrape_list:
    for item in lst:  # do NOT iterate through scrape_list here!!
        if item != '':
            results.append(item)

scrape_list中的单个项目是list，绝对是!= ''，因此此内部列表会附加到results，因此您的输出。 scrape_list的嵌套特性也会使您的过滤器语句失败。你可以使用

scrape_list1 = [s for l in scrape_list for s in filter(None, l)]

获得一个平面的字符串列表。

Answer 2

如果你想要一种纯粹的pythonic方式，你可以使用嵌套列表理解

[[y for y in x if y] for x in a]

在我的电脑上，控制台看起来像这样

>>> a
[['text', 'text', '', '', 'text', 'text']]
>>> [[y for y in x if y] for x in a]
[['text', 'text', 'text', 'text']]
>>>

Answer 3

正如@chunjef在评论中所提到的，你正在迭代scrape_list两次。顺便说一下，这样做更紧凑的方式是

>>> ll = [['text', 'text', '', '', 'text', 'text']]
>>> results = [item for l in ll for item in l if item!='']
>>> results
['text', 'text', 'text', 'text']

[item for l in ll for item in l if item!='']在ll列出l并展开任何''项，如果它与空字符串不同ViewPager

Scraping返回非空的空列表索引

3 个答案: