我需要加快计算此代码中不同元素的时间,而且我不确定如何更快地计算。
def process_columns(columns):
with open(columns, 'r') as src:
data = csv.reader(src, delimiter ='\t', skipinitialspace = False)
category = []
group = columns.split("/")
group = group[-1].split(".")
if group[0] in ["data_1", "data_2"]:
for row in data:
if row[0] not in category:
category.append(row[0])
message = "\t%d distinct elements from %ss" % (len(category), group[0])
print message
答案 0 :(得分:1)
计算python数组中不同元素的主方法是:
array = [1,1,2,3,3,4,5,6,6]
n_elts = len(set(array))
print(n_elts)
输出:
6
答案 1 :(得分:1)
如果您对数据缺乏了解,可以使用collections.defaultdict
快速为群组维护一组唯一字词。
from collections import defaultdict
def process_columns(columns):
categories = defaultdict(set) # initialises a default dict with values as sets
with open(columns, 'r') as src:
data = csv.reader(src, delimiter ='\t', skipinitialspace = False)
group = columns.split("/")[-1].split('.')
for row in data:
categories[group[0]].update(row[0])
for k in categories:
message = "\t%d distinct elements from %ss" % (len(categories[k]), k)
print message
答案 2 :(得分:0)
将类别初始化为一组;并删除if块以将数据添加到类别中,将其替换为category.add
category = {}
group = columns.split("/")
group = group[-1].split(".")
if group[0] in ["data_1", "data_2"]:
for row in data:
category.add(row[0])
希望这很清楚