Question

我有一张数字，用空格分隔成列。每列代表不同的类别，在每列中，每个数字代表不同的值。例如，第4列表示年龄，在列中，数字5表示年龄为44-55。显然，每一行都是不同的人的记录。我想使用Python脚本搜索工作表，并查找第六列编号为“1”的所有列。之后，我想知道第一列中每个数字出现的次数，其中第六列中的数字等于“1”。脚本应输出给用户“当第六列等于'1'时，值'1'在第一列中出现12次。值'2'出现18次......”等等我希望我很清楚这里。我只是想让它列出数字，基本上。无论如何，我是Python的新手。我在下面附上了我的代码。我想我应该使用词典，但我不完全确定如何。到目前为止，我还没有真正接近解决这个问题。如果有人能够引导我完成这些代码背后的逻辑，我将非常感激。非常感谢你！

ldata = open("list.data", "r")
income_dist = {} 

for line in ldata:
    linelist = line.strip().split(" ")
key_income_dist = linelist[6] 
if key_income_dist in income_dist: 
    income_dist[key_income_dist] = 1 + income_dist[key_income_dist] 
else:
        income_dist[key_income_dist] = 1 

ldata.close()

print value_no_occupations

Answer 1

首先，缩进在Python中非常重要，上面的内容很糟糕：linelist = line.strip().split(" ")之后的5行需要缩进，就像它们应该一样。

接下来，他们应该进一步缩进，并在他们之前添加这一行：

    if len(linelist)>6 and linelist[6]=="1":

这一行跳过短线（有一些），并测试你所说的你想要的东西：“，其中第六列等于”1。“”这是第一列[6]该行上的数字被引用为[0]（这些是“偏移”，而不是“基数”，或计数，数字）。

您可能希望将key_income_dist = linelist[6]更改为key_income_dist = linelist[0]或[1]以获得所需内容。如有必要，请四处游玩。

最后，您应该在结尾说print income_dist以查看结果。如果您想要更高档的输出，请查看formatting。

Answer 2

这实际上比看起来更容易！关键是collections.Counter

from collections import Counter

ldata = open("list.data")

rows = [tuple(row.split()) for row in ldata if row.split()[5]==1]
# warning this will break if some rows are shorter than 6 columns

first_col = Counter(item[0] for item in rows)

如果您想要分配每个列（不仅仅是第一个），请执行以下操作：

distribution = {column: Counter(item[column] for item in rows) for column in range(len(rows[0]))}
# warning this will break if all rows are not the same size!

Answer 3

按照原始程序逻辑，我想出了这个版本：

ldata = open("list.data", "r")

# read in all the rows, note that the list values are strings instead of integers
linelist = []
for line in ldata:
    linelist.append(tuple(line.strip().split(" ")))
ldata.close()

# keep only the rows with 6th column = '1'
only1 = []
for row in linelist:
    if row[5] == '1':
        only1.append(row)

# tally the statistics
income_dist = {}
for row in only1:
    if row[0] in income_dist:
        income_dist[row[0]] += 1
    else:
        income_dist[row[0]] = 1

# print result
print "While column six equals '1',"
for num in sorted(income_dist):
    print "the value %s appears %d times in column one." % (num, income_dist[num])

Answer 4

考虑到数据文件有大约9000行数据，如果您不想保留原始数据，可以将步骤1和2使程序使用更少的内存，速度更快。

ldata = open("list.data", "r")

# read in all the rows, note that the list values are strings instead of integers
# keep only the rows with 6th column = '1'
only1 = []
for line in ldata:
    if line.strip() == '':      # ignor blank lines
        continue
    row = tuple(line.strip().split(" "))
    if row[5] == '1':
        only1.append(row)
ldata.close()

# tally the statistics
income_dist = {}
for row in only1:
    if row[0] in income_dist:
        income_dist[row[0]] += 1
    else:
        income_dist[row[0]] = 1

# print result
print "While column six equals '1',"
for num in sorted(income_dist):
    print "the value %s appears %d times in column one." % (num, income_dist[num])

list.data中的示例测试数据：

9 2 1 5 4 5 5 3 3 0 1 1 7 NA
9 1 1 5 5 5 5 3 5 2 1 1 7 1
9 2 1 3 5 1 5 2 3 1 2 3 7 1
1 2 5 1 2 6 5 1 4 2 3 1 7 1
1 2 5 1 2 6 3 1 4 2 3 1 7 1
8 1 1 6 4 8 5 3 2 0 1 1 7 1
1 1 5 2 3 9 4 1 3 1 2 3 7 1
6 1 3 3 4 1 5 1 1 0 2 3 7 1
2 1 1 6 3 8 5 3 3 0 2 3 7 1
4 1 1 7 4 8 4 3 2 0 2 3 7 1
1 1 5 2 4 1 5 1 1 0 2 3 7 1
4 2 2 2 3 2 5 1 2 0 1 1 5 1
8 2 1 3 6 6 2 2 4 2 1 1 7 1
7 2 1 5 3 5 5 3 4 0 2 1 7 1
1 1 5 2 3 9 4 1 3 1 2 3 7 1
6 1 3 3 4 1 5 1 1 0 2 3 7 1
2 1 1 6 3 8 5 3 3 0 2 3 7 1
4 1 1 7 4 8 4 3 2 0 2 3 7 1
1 1 5 2 4 9 5 1 1 0 2 3 7 1
4 2 2 2 3 2 5 1 2 0 1 1 5 1

在Python中列出发行版

4 个答案: