我是Python的新手,我发现set()有点令人困惑。有人可以帮助找到并创建一个新的唯一数字列表(另一个词可以消除重复数字)吗?
import string
import re
def go():
import re
file = open("C:/Cryptography/Pollard/Pollard/newfile.txt","w")
filename = "C:/Cryptography/Pollard/Pollard/primeFactors.txt"
with open(filename, 'r') as f:
lines = f.read()
found = re.findall(r'[\d]+[^\d.\d+()+\s]+[^\s]+[\d+\w+\d]+[\d+\^+\d]+[\d+\w+\d]+', lines)
a = found
for i in range(5):
a[i] = str(found[i])
print(a[i].split('x'))
现在
print(a[i].split('x'))
....提供以下输出
['2', '3', '1451', '40591', '258983', '11409589', '8337580729',
'1932261797039146667']
['2897', '514081', '585530047', '108785617538783538760452408483163']
['2', '3', '5', '19', '28087', '4947999059',
'2182718359336613102811898933144207']
['3', '5', '53', '293', '31159', '201911', '7511070764480753',
'22798192180727861167']
['2', '164493637239099960712719840940483950285726027116731']
如何输出仅非重复数字的列表?我在论坛上看到“set()”可以做到这一点,但我试过这个没有用。非常感谢任何帮助!
答案 0 :(得分:4)
set
是一个集合(如list
或tuple
),但它不允许重复,并且具有非常快速的成员资格测试。您可以使用列表推导来过滤出一个列表中出现在上一个列表中的值:
data = [['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667'],
['2897', '514081', '585530047', '108785617538783538760452408483163'],
['2', '3', '5', '19', '28087', '4947999059', '2182718359336613102811898933144207'],
['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167'],
['2', '164493637239099960712719840940483950285726027116731']]
seen = set() # set of seen values, which starts out empty
for lst in data:
deduped = [x for x in lst if x not in seen] # filter out previously seen values
seen.update(deduped) # add the new values to the set
print(deduped) # do whatever with deduped list
输出:
['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667']
['2897', '514081', '585530047', '108785617538783538760452408483163']
['5', '19', '28087', '4947999059', '2182718359336613102811898933144207']
['53', '293', '31159', '201911', '7511070764480753', '22798192180727861167']
['164493637239099960712719840940483950285726027116731']
请注意,此版本不会过滤掉单个列表中重复的值(除非它们已经与前一个列表中的值重复)。您可以通过使用显式循环替换列表推导来解决这个问题,该循环在追加到输出列表之前,针对seen
集(以及add
s,如果它是新的)检查每个单独的值。或者,如果子列表中的项目顺序不重要,您可以将它们变成自己的集合:
seen = set()
for lst in data:
lst_as_set = set(lst) # this step eliminates internal duplicates
deduped_set = lst_as_set - seen # set subtraction!
seen.update(deduped_set)
# now do stuff with deduped_set, which is iterable, but in an arbitrary order
最后,如果内部子列表完全是红色鲱鱼,并且您只想过滤一个扁平列表以仅获取唯一值,那么这听起来就像来自itertools
documentation的unique_everseen
食谱的作业}:
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
答案 1 :(得分:2)
set
应该适用于这种情况。
您可以尝试以下操作:
# Concat all your lists into a single list
>>> a = ['2', '3', '1451', '40591', '258983', '11409589', '8337580729','1932261797039146667'] +['2897', '514081', '585530047', '108785617538783538760452408483163'] +['2', '3', '5', '19', '28087', '4947999059','2182718359336613102811898933144207'] + ['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167']+ ['2', '164493637239099960712719840940483950285726027116731']
>>> len(a)
29
>>> set(a)
set(['514081', '258983', '40591', '201911', '11409589', '585530047', '3', '2', '5', '108785617538783538760452408483163', '2279819218\
0727861167', '164493637239099960712719840940483950285726027116731', '8337580729', '4947999059', '19', '2897', '7511070764480753', '5\
3', '28087', '2182718359336613102811898933144207', '1451', '31159', '1932261797039146667', '293'])
>>> len(set(a))
24
>>>
答案 2 :(得分:0)
如果要从展平列表中获取唯一值,可以使用reduce()来展平列表。然后使用frozenset()构造函数获取结果列表:
>>> data = [
['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667'],
['2897', '514081', '585530047', '108785617538783538760452408483163'],
['2', '3', '5', '19', '28087', '4947999059', '2182718359336613102811898933144207'],
['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167'],
['2', '164493637239099960712719840940483950285726027116731']]
>>> print list(frozenset(reduce((lambda a, b: a+b), data)))
['514081', '258983', '40591', '201911', '11409589', '585530047', '3',
'2', '5', '108785617538783538760452408483163', '22798192180727861167',
'164493637239099960712719840940483950285726027116731', '8337580729',
'4947999059', '19', '2897', '7511070764480753', '53', '28087',
'2182718359336613102811898933144207', '1451', '31159',
'1932261797039146667', '293']