在这里遇到问题:
以下示例:
for item in g_data:
Header = item.find_all("div", {"class": "InnprodInfos"})
print(Header[0].contents[0].text.strip())
输出:
DMZ 3rd Tunnel - Korean Demilitarized Zone Day Tour from Seoul
Panmunjeom Day Tour
Seoul City Half Day Private Tour
The Soul of Seoul - Small Group Tour
Seoul Helicopter Tour
Seoul City Full Day Tour
Seoul City Half Day Tour
The Street Museum in the Urban Core - Small Group Tour
Korean Folk Village Day Tour
DMZ 3rd Tunnel - Korean Demilitarized Zone Day Tour from Seoul
Panmunjeom Day Tour
Seoul City Half Day Private Tour
The Soul of Seoul - Small Group Tour
Seoul Helicopter Tour
Seoul City Full Day Tour
Seoul City Half Day Tour
The Street Museum in the Urban Core - Small Group Tour
Korean Folk Village Day Tour
如上所示,它为我提供了两次输出。因此,只应删除第二个重复项。
结果如下:
DMZ 3rd Tunnel - Korean Demilitarized Zone Day Tour from Seoul
Panmunjeom Day Tour
Seoul City Half Day Private Tour
The Soul of Seoul - Small Group Tour
Seoul Helicopter Tour
Seoul City Full Day Tour
Seoul City Half Day Tour
The Street Museum in the Urban Core - Small Group Tour
Korean Folk Village Day Tour
任何人都可以向我提供反馈如何删除重复项吗?任何反馈都表示赞赏。
答案 0 :(得分:0)
您可以使用列表或集合(如果订单无关紧要):
使用列表:
result = []
for item in g_data:
header = item.find_all("div", {"class": "InnprodInfos"})
item = header[0].contents[0].text.strip()
if item not in result:
result.append(item)
print '\n'.join(result)
使用set:
result = set()
for item in g_data:
header = item.find_all("div", {"class": "InnprodInfos"})
result.add(header[0].contents[0].text.strip())
print '\n'.join(result)
答案 1 :(得分:0)
您应该将输出存储在一个集合中,以验证它是否已经“打印”过。之后,您将打印出该组的元素。
g_data = ["foo", "bar", "foo"]
g_unique = set()
for item in g_data:
g_unique.add(item) # ensures the element will only be copied if not already in the set
for item in g_unique:
print(item) # {'foo', 'bar'}
答案 2 :(得分:0)
您可以使用set
来跟踪您打印的项目。这保留了原始订单
already_printed = set()
for item in g_data:
header = item.find_all("div", {"class": "InnprodInfos"})
item = header[0].contents[0].text.strip()
if item not in already_printed:
print(item)
already_printed.add(item)
答案 3 :(得分:0)
使用列表推导有一种简单的方法:)
s = set()
[s.add(text) for d_text in Header[0].contents[0].text.strip().split('\n')]
print('\n'.join([text for text in s]))