我有一张数字,用空格分隔成列。每列代表不同的类别,在每列中,每个数字代表不同的值。例如,第4列表示年龄,在列中,数字5表示年龄为44-55。显然,每一行都是不同的人的记录。我想使用Python脚本搜索工作表,并查找第六列编号为“1”的所有列。之后,我想知道第一列中每个数字出现的次数,其中第六列中的数字等于“1”。脚本应输出给用户“当第六列等于'1'时,值'1'在第一列中出现12次。值'2'出现18次......”等等我希望我很清楚这里。我只是想让它列出数字,基本上。无论如何,我是Python的新手。我在下面附上了我的代码。我想我应该使用词典,但我不完全确定如何。到目前为止,我还没有真正接近解决这个问题。如果有人能够引导我完成这些代码背后的逻辑,我将非常感激。非常感谢你!
ldata = open("list.data", "r")
income_dist = {}
for line in ldata:
linelist = line.strip().split(" ")
key_income_dist = linelist[6]
if key_income_dist in income_dist:
income_dist[key_income_dist] = 1 + income_dist[key_income_dist]
else:
income_dist[key_income_dist] = 1
ldata.close()
print value_no_occupations
答案 0 :(得分:3)
首先,缩进在Python中非常重要,上面的内容很糟糕:linelist = line.strip().split(" ")
之后的5行需要缩进,就像它们应该一样。
接下来,他们应该进一步缩进,并在他们之前添加这一行:
if len(linelist)>6 and linelist[6]=="1":
这一行跳过短线(有一些),并测试你所说的你想要的东西:“,其中第六列等于”1。“”这是第一列[6]该行上的数字被引用为[0](这些是“偏移”,而不是“基数”,或计数,数字)。
您可能希望将key_income_dist = linelist[6]
更改为key_income_dist = linelist[0]
或[1]
以获得所需内容。如有必要,请四处游玩。
最后,您应该在结尾说print income_dist
以查看结果。如果您想要更高档的输出,请查看formatting。
答案 1 :(得分:2)
这实际上比看起来更容易!关键是collections.Counter
from collections import Counter
ldata = open("list.data")
rows = [tuple(row.split()) for row in ldata if row.split()[5]==1]
# warning this will break if some rows are shorter than 6 columns
first_col = Counter(item[0] for item in rows)
如果您想要分配每个列(不仅仅是第一个),请执行以下操作:
distribution = {column: Counter(item[column] for item in rows) for column in range(len(rows[0]))}
# warning this will break if all rows are not the same size!
答案 2 :(得分:1)
按照原始程序逻辑,我想出了这个版本:
ldata = open("list.data", "r")
# read in all the rows, note that the list values are strings instead of integers
linelist = []
for line in ldata:
linelist.append(tuple(line.strip().split(" ")))
ldata.close()
# keep only the rows with 6th column = '1'
only1 = []
for row in linelist:
if row[5] == '1':
only1.append(row)
# tally the statistics
income_dist = {}
for row in only1:
if row[0] in income_dist:
income_dist[row[0]] += 1
else:
income_dist[row[0]] = 1
# print result
print "While column six equals '1',"
for num in sorted(income_dist):
print "the value %s appears %d times in column one." % (num, income_dist[num])
答案 3 :(得分:1)
考虑到数据文件有大约9000行数据,如果您不想保留原始数据,可以将步骤1和2使程序使用更少的内存,速度更快。
ldata = open("list.data", "r")
# read in all the rows, note that the list values are strings instead of integers
# keep only the rows with 6th column = '1'
only1 = []
for line in ldata:
if line.strip() == '': # ignor blank lines
continue
row = tuple(line.strip().split(" "))
if row[5] == '1':
only1.append(row)
ldata.close()
# tally the statistics
income_dist = {}
for row in only1:
if row[0] in income_dist:
income_dist[row[0]] += 1
else:
income_dist[row[0]] = 1
# print result
print "While column six equals '1',"
for num in sorted(income_dist):
print "the value %s appears %d times in column one." % (num, income_dist[num])
list.data
中的示例测试数据:
9 2 1 5 4 5 5 3 3 0 1 1 7 NA
9 1 1 5 5 5 5 3 5 2 1 1 7 1
9 2 1 3 5 1 5 2 3 1 2 3 7 1
1 2 5 1 2 6 5 1 4 2 3 1 7 1
1 2 5 1 2 6 3 1 4 2 3 1 7 1
8 1 1 6 4 8 5 3 2 0 1 1 7 1
1 1 5 2 3 9 4 1 3 1 2 3 7 1
6 1 3 3 4 1 5 1 1 0 2 3 7 1
2 1 1 6 3 8 5 3 3 0 2 3 7 1
4 1 1 7 4 8 4 3 2 0 2 3 7 1
1 1 5 2 4 1 5 1 1 0 2 3 7 1
4 2 2 2 3 2 5 1 2 0 1 1 5 1
8 2 1 3 6 6 2 2 4 2 1 1 7 1
7 2 1 5 3 5 5 3 4 0 2 1 7 1
1 1 5 2 3 9 4 1 3 1 2 3 7 1
6 1 3 3 4 1 5 1 1 0 2 3 7 1
2 1 1 6 3 8 5 3 3 0 2 3 7 1
4 1 1 7 4 8 4 3 2 0 2 3 7 1
1 1 5 2 4 9 5 1 1 0 2 3 7 1
4 2 2 2 3 2 5 1 2 0 1 1 5 1