我有2个元素的列表:公司ID和组号。我想以此方式根据不同列表中的组号对这些公司进行分组,以便我可以对每个单独的组进行一些回归分析。我的清单:
59872004 0
74202004 0
1491772004 1
1476392004 1
309452004 1
1171452004 1
150842004 2
143592004 2
76202004 2
119232004 2
80492004 2
291732004 2
我当前的代码如下:
list_of_variables = []
with open(str(csv_path) + "2004-297-100.csv", 'r') as csvFile:
reader = csv.reader(csvFile)
for row in reader:
list_of_variables.append(row)
del list_of_variables[0]
list_of_lists = []
counter = 0
counter_list = 0
one_cluster = []
variable = []
for line in list_of_variables:
print('counter: ', counter_list)
# for testing purposes
if counter_list == 20:
break
# print("cluster: ", cluster)
# append the first line from the list to the intermediary list
if counter_list == 0:
one_cluster.append(line)
if counter_list >= 1:
if line[1] == variable[1]:
one_cluster.append(line)
print("one cluster : ", one_cluster)
variable = one_cluster[counter-1]
# print('line : ', line[1])
# print('variable : ', variable[1])
counter += 1
# if the grouped number changed put the list into the final list
# clear the intermediary list and append the current element which was not part of the previous group
if line[1] != variable[1]:
list_of_lists.append(one_cluster.copy())
# print("here", list_of_lists)
one_cluster.clear()
one_cluster.append(line)
counter = 0
# print('variable', variable)
# print('one_cluster ', one_cluster)
counter_list += 1
print(list_of_lists)
该代码的输出如下:
[[[['59872004','0'],['74202004','0']],[['1491772004','1'],['309452004','1'],['1171452004 ','1']],[['150842004','2'],['76202004','2'],['119232004','2'],['80492004','2'],[ '291732004','2']]]
代码的预期输出:
[[[['59872004','0'],['74202004','0']],[['1491772004','1'],['1476392004','1'],['309452004 ','1'],['1171452004','1']],[['150842004','2'],['143592004','2'],['76202004','2'],[ '119232004','2'],['80492004','2'],['291732004','2']]]
如果您仔细观察,则零组工作正确,但是其他所有组都有缺失的公司。例如,组1应该具有4个元素,但是我的代码仅输出3个元素,以此类推。我环顾四周,但没有找到可以使此操作更轻松的方法。如果您知道如何解决此问题或为我指明正确的方向,我将不胜感激。
感谢您的时间和耐心!
更新:我将列表从图片更改为可以复制的内容。并增加了预期的输出。
答案 0 :(得分:1)
您使代码过于复杂。如果您的目标是根据csv文件的第二列将所有这些公司分组,则只需在读取文件后添加以下代码即可:
from collections import defaultdict
grouping = defaultdict(list)
for line in list_of_variables:
grouping[line[1]].append(line[0])
现在,如果要使用一组元素,那么假设第1组只是遍历它:
for company in grouping[1]:
答案 1 :(得分:0)
我找到了解决我问题的方法。如果我剪线
变量= one_cluster [counter-1] 并将其放在
之前if counter_list >= 1:
if line[1] == variable[1]:
one_cluster.append(line)
在for循环中获取以下代码:
for line in list_of_variables:
print('counter: ', counter_list)
if counter_list == 50:
break
# print("cluster: ", cluster)
if counter_list == 0:
one_cluster.append(line)
variable = one_cluster[counter - 1]
if counter_list >= 1:
if line[1] == variable[1]:
one_cluster.append(line)
print("one cluster : ", one_cluster)
# print('line : ', line[1])
# print('variable : ', variable[1])
counter += 1
if line[1] != variable[1]:
list_of_lists.append(one_cluster.copy())
# print("here", list_of_lists)
one_cluster.clear()
one_cluster.append(line)
counter = 0
# print('variable', variable)
# print('one_cluster ', one_cluster)
counter_list += 1
然后一切正常。我已经为此苦苦挣扎了很长时间,然后这个主意才浮现出来……但是,如果有人有更简单的方法可以做到这一点,我欢迎您提出建议。