在Python中列出发行版

时间:2014-02-19 22:50:04

标签: python

我有一张数字,用空格分隔成列。每列代表不同的类别,在每列中,每个数字代表不同的值。例如,第4列表示年龄,在列中,数字5表示年龄为44-55。显然,每一行都是不同的人的记录。我想使用Python脚本搜索工作表,并查找第六列编号为“1”的所有列。之后,我想知道第一列中每个数字出现的次数,其中第六列中的数字等于“1”。脚本应输出给用户“当第六列等于'1'时,值'1'在第一列中出现12次。值'2'出现18次......”等等我希望我很清楚这里。我只是想让它列出数字,基本上。无论如何,我是Python的新手。我在下面附上了我的代码。我想我应该使用词典,但我不完全确定如何。到目前为止,我还没有真正接近解决这个问题。如果有人能够引导我完成这些代码背后的逻辑,我将非常感激。非常感谢你!

ldata = open("list.data", "r")
income_dist = {} 

for line in ldata:
    linelist = line.strip().split(" ")
key_income_dist = linelist[6] 
if key_income_dist in income_dist: 
    income_dist[key_income_dist] = 1 + income_dist[key_income_dist] 
else:
        income_dist[key_income_dist] = 1 

ldata.close()

print value_no_occupations

4 个答案:

答案 0 :(得分:3)

首先,缩进在Python中非常重要,上面的内容很糟糕:linelist = line.strip().split(" ")之后的5行需要缩进,就像它们应该一样。

接下来,他们应该进一步缩进,并在他们之前添加这一行:

    if len(linelist)>6 and linelist[6]=="1":

这一行跳过短线(有一些),并测试你所说的你想要的东西:“,其中第六列等于”1。“”这是第一列[6]该行上的数字被引用为[0](这些是“偏移”,而不是“基数”,或计数,数字)。

您可能希望将key_income_dist = linelist[6]更改为key_income_dist = linelist[0][1]以获得所需内容。如有必要,请四处游玩。

最后,您应该在结尾说print income_dist以查看结果。如果您想要更高档的输出,请查看formatting

答案 1 :(得分:2)

这实际上比看起来更容易!关键是collections.Counter

from collections import Counter

ldata = open("list.data")

rows = [tuple(row.split()) for row in ldata if row.split()[5]==1]
# warning this will break if some rows are shorter than 6 columns

first_col = Counter(item[0] for item in rows)

如果您想要分配每个列(不仅仅是第一个),请执行以下操作:

distribution = {column: Counter(item[column] for item in rows) for column in range(len(rows[0]))}
# warning this will break if all rows are not the same size!

答案 2 :(得分:1)

按照原始程序逻辑,我想出了这个版本:

ldata = open("list.data", "r")

# read in all the rows, note that the list values are strings instead of integers
linelist = []
for line in ldata:
    linelist.append(tuple(line.strip().split(" ")))
ldata.close()

# keep only the rows with 6th column = '1'
only1 = []
for row in linelist:
    if row[5] == '1':
        only1.append(row)

# tally the statistics
income_dist = {}
for row in only1:
    if row[0] in income_dist:
        income_dist[row[0]] += 1
    else:
        income_dist[row[0]] = 1

# print result
print "While column six equals '1',"
for num in sorted(income_dist):
    print "the value %s appears %d times in column one." % (num, income_dist[num])

答案 3 :(得分:1)

考虑到数据文件有大约9000行数据,如果您不想保留原始数据,可以将步骤1和2使程序使用更少的内存,速度更快。

ldata = open("list.data", "r")

# read in all the rows, note that the list values are strings instead of integers
# keep only the rows with 6th column = '1'
only1 = []
for line in ldata:
    if line.strip() == '':      # ignor blank lines
        continue
    row = tuple(line.strip().split(" "))
    if row[5] == '1':
        only1.append(row)
ldata.close()

# tally the statistics
income_dist = {}
for row in only1:
    if row[0] in income_dist:
        income_dist[row[0]] += 1
    else:
        income_dist[row[0]] = 1

# print result
print "While column six equals '1',"
for num in sorted(income_dist):
    print "the value %s appears %d times in column one." % (num, income_dist[num])

list.data中的示例测试数据:

9 2 1 5 4 5 5 3 3 0 1 1 7 NA
9 1 1 5 5 5 5 3 5 2 1 1 7 1
9 2 1 3 5 1 5 2 3 1 2 3 7 1
1 2 5 1 2 6 5 1 4 2 3 1 7 1
1 2 5 1 2 6 3 1 4 2 3 1 7 1
8 1 1 6 4 8 5 3 2 0 1 1 7 1
1 1 5 2 3 9 4 1 3 1 2 3 7 1
6 1 3 3 4 1 5 1 1 0 2 3 7 1
2 1 1 6 3 8 5 3 3 0 2 3 7 1
4 1 1 7 4 8 4 3 2 0 2 3 7 1
1 1 5 2 4 1 5 1 1 0 2 3 7 1
4 2 2 2 3 2 5 1 2 0 1 1 5 1
8 2 1 3 6 6 2 2 4 2 1 1 7 1
7 2 1 5 3 5 5 3 4 0 2 1 7 1
1 1 5 2 3 9 4 1 3 1 2 3 7 1
6 1 3 3 4 1 5 1 1 0 2 3 7 1
2 1 1 6 3 8 5 3 3 0 2 3 7 1
4 1 1 7 4 8 4 3 2 0 2 3 7 1
1 1 5 2 4 9 5 1 1 0 2 3 7 1
4 2 2 2 3 2 5 1 2 0 1 1 5 1