我正在制作一个Python脚本,使用xlrd
库解析Excel文件。
我想要的是在不同的列if
上进行计算,单元格包含一定的值。否则,跳过这些值。然后将输出存储在字典中。
这是我试图做的事情:
import xlrd
workbook = xlrd.open_workbook('filter_data.xlsx')
worksheet = workbook.sheet_by_name('filter_data')
num_rows = worksheet.nrows -1
num_cells = worksheet.ncols - 1
first_col = 0
scnd_col = 1
third_col = 2
# Read Data into double level dictionary
celldict = dict()
for curr_row in range(num_rows) :
cell0_val = int(worksheet.cell_value(curr_row+1,first_col))
cell1_val = worksheet.cell_value(curr_row,scnd_col)
cell2_val = worksheet.cell_value(curr_row,third_col)
if cell1_val[:3] == 'BL1' :
if cell2_val=='toSkip' :
continue
elif cell1_val[:3] == 'OUT' :
if cell2_val == 'toSkip' :
continue
if not cell0_val in celldict :
celldict[cell0_val] = dict()
# if the entry isn't in the second level dictionary then add it, with count 1
if not cell1_val in celldict[cell0_val] :
celldict[cell0_val][cell1_val] = 1
# Otherwise increase the count
else :
celldict[cell0_val][cell1_val] += 1
所以在这里你可以看到,我计算每个“cell0_val”的“cell1_val”值的数量。但是我想跳过那些在相邻列的单元格中有“toSkip”的值,然后再进行求和并将其存储在dict中。 我在这里做错了,我觉得解决方案要简单得多。 任何帮助,将不胜感激。感谢。
以下是我的工作表示例:
cell0 cell1 cell2
12 BL1 toSkip
12 BL1 doNotSkip
12 OUT3 doNotSkip
12 OUT3 toSkip
13 BL1 doNotSkip
13 BL1 toSkip
13 OUT3 doNotSkip
答案 0 :(得分:1)
将collections.defaultdict
与collections.Counter
一起用于嵌套词典。
这是在行动:
>>> from collections import defaultdict, Counter
>>> d = defaultdict(Counter)
>>> d['red']['blue'] += 1
>>> d['green']['brown'] += 1
>>> d['red']['blue'] += 1
>>> pprint.pprint(d)
{'green': Counter({'brown': 1}),
'red': Counter({'blue': 2})}
此处它已集成到您的代码中:
from collections import defaultdict, Counter
import xlrd
workbook = xlrd.open_workbook('filter_data.xlsx')
worksheet = workbook.sheet_by_name('filter_data')
first_col = 0
scnd_col = 1
third_col = 2
celldict = defaultdict(Counter)
for curr_row in range(1, worksheet.nrows): # start at 1 skips header row
cell0_val = int(worksheet.cell_value(curr_row, first_col))
cell1_val = worksheet.cell_value(curr_row, scnd_col)
cell2_val = worksheet.cell_value(curr_row, third_col)
if cell2_val == 'toSkip' and cell1_val[:3] in ('BL1', 'OUT'):
continue
celldict[cell0_val][cell1_val] += 1
我还合并了您的if-statments并将curr_row
的计算更改为更简单。
答案 1 :(得分:0)
似乎您希望在cell2_val
等于'toSkip'
时跳过当前行,因此如果您在计算if cell2_val=='toSkip' : continue
后直接添加cell2_val
,则会简化代码。
此外,你有
的地方# if the entry isn't in the second level dictionary then add it, with count 1
if not cell1_val in celldict[cell0_val] :
celldict[cell0_val][cell1_val] = 1
# Otherwise increase the count
else :
celldict[cell0_val][cell1_val] += 1
通常的习语更像是
celldict[cell0_val][cell1_val] = celldict[cell0_val].get(cell1_val, 0) + 1
也就是说,使用默认值0,这样如果键cell1_val
尚未在celldict[cell0_val]
中,那么get()
将返回0.