我试过跟this post但是,它似乎不适合我。
我试过这段代码:
for bresult in response.css(LIST_SELECTOR):
NAME_SELECTOR = 'h2 a ::attr(href)'
yield {
'name': bresult.css(NAME_SELECTOR).extract_first(),
}
b_result_list.append(bresult.css(NAME_SELECTOR).extract_first())
#set b_result_list to SET to remove dups, then change back to LIST
set(b_result_list)
list(set(b_result_list))
for brl in b_result_list:
print("brl: {}".format(brl))
打印出来:
brl: https://facebook.site.com/users/login
brl: https://facebook.site.com/users
brl: https://facebook.site.com/users/login
当我需要时:
brl: https://facebook.site.com/users/login
brl: https://facebook.site.com/users
我在这里做错了什么?
谢谢!
答案 0 :(得分:7)
当您需要保存结果时,您将丢弃结果... b_result_list
从未实际更改过......所以您只是在原始列表上进行迭代。而是保存set
操作的结果
b_result_list = list(set(b_result_list))
(请注意set
不保留顺序)
答案 1 :(得分:1)
如果您想维持订单和独特性,可以这样做:
>>> li
['1', '1', '2', '2', '3', '3', '3', '3', '1', '1', '4', '5', '4', '6', '6']
>>> seen=set()
>>> [e for e in li if not (e in seen or seen.add(e))]
['1', '2', '3', '4', '5', '6']
或者,您可以使用OrderedDict的键:
>>> from collections import OrderedDict
>>> OrderedDict([(k, None) for k in li]).keys()
['1', '2', '3', '4', '5', '6']
但单独一组可能会大大改变原始列表的顺序:
>>> list(set(li))
['1', '3', '2', '5', '4', '6']