Question

我正在制作一个Python脚本，使用xlrd库解析Excel文件。我想要的是在不同的列if上进行计算，单元格包含一定的值。否则，跳过这些值。然后将输出存储在字典中。这是我试图做的事情：

import xlrd


workbook = xlrd.open_workbook('filter_data.xlsx')
worksheet = workbook.sheet_by_name('filter_data')

num_rows = worksheet.nrows -1
num_cells = worksheet.ncols - 1

first_col = 0
scnd_col = 1
third_col = 2

# Read Data into double level dictionary
celldict = dict()
for curr_row in range(num_rows)  :

    cell0_val = int(worksheet.cell_value(curr_row+1,first_col))
    cell1_val = worksheet.cell_value(curr_row,scnd_col)
    cell2_val = worksheet.cell_value(curr_row,third_col)

    if cell1_val[:3] == 'BL1' :
        if cell2_val=='toSkip' :
        continue
    elif cell1_val[:3] == 'OUT' :
        if cell2_val == 'toSkip' :
        continue
    if not cell0_val in celldict :
        celldict[cell0_val] = dict()
# if the entry isn't in the second level dictionary then add it, with count 1
    if not cell1_val in celldict[cell0_val] :
        celldict[cell0_val][cell1_val] = 1
        # Otherwise increase the count
    else :
        celldict[cell0_val][cell1_val] += 1

所以在这里你可以看到，我计算每个“cell0_val”的“cell1_val”值的数量。但是我想跳过那些在相邻列的单元格中有“toSkip”的值，然后再进行求和并将其存储在dict中。我在这里做错了，我觉得解决方案要简单得多。任何帮助，将不胜感激。感谢。

以下是我的工作表示例：

cell0 cell1  cell2
12    BL1    toSkip
12    BL1    doNotSkip
12    OUT3   doNotSkip
12    OUT3   toSkip
13    BL1    doNotSkip
13    BL1    toSkip
13    OUT3   doNotSkip

Answer 1

将collections.defaultdict与collections.Counter一起用于嵌套词典。

这是在行动：

>>> from collections import defaultdict, Counter
>>> d = defaultdict(Counter)
>>> d['red']['blue'] += 1
>>> d['green']['brown'] += 1
>>> d['red']['blue'] += 1
>>> pprint.pprint(d)
{'green': Counter({'brown': 1}),
 'red': Counter({'blue': 2})}

此处它已集成到您的代码中：

from collections import defaultdict, Counter
import xlrd

workbook = xlrd.open_workbook('filter_data.xlsx')
worksheet = workbook.sheet_by_name('filter_data')

first_col = 0
scnd_col = 1
third_col = 2

celldict = defaultdict(Counter)
for curr_row in range(1, worksheet.nrows): # start at 1 skips header row

    cell0_val = int(worksheet.cell_value(curr_row, first_col))
    cell1_val = worksheet.cell_value(curr_row, scnd_col)
    cell2_val = worksheet.cell_value(curr_row, third_col)

    if cell2_val == 'toSkip' and cell1_val[:3] in ('BL1', 'OUT'):
        continue

    celldict[cell0_val][cell1_val] += 1

我还合并了您的if-statments并将curr_row的计算更改为更简单。

Answer 2

似乎您希望在cell2_val等于'toSkip'时跳过当前行，因此如果您在计算if cell2_val=='toSkip' : continue后直接添加cell2_val，则会简化代码。

此外，你有

的地方

# if the entry isn't in the second level dictionary then add it, with count 1
if not cell1_val in celldict[cell0_val] :
    celldict[cell0_val][cell1_val] = 1
    # Otherwise increase the count
else :
    celldict[cell0_val][cell1_val] += 1

通常的习语更像是

celldict[cell0_val][cell1_val] = celldict[cell0_val].get(cell1_val, 0) + 1

也就是说，使用默认值0，这样如果键cell1_val尚未在celldict[cell0_val]中，那么get()将返回0.

用python跳过excel行

2 个答案: