Question

请参阅下面的代码。这段代码效果很好，但我想做两件事。有一点是我做了if语句或比实际更短的例如。我有很多像这样的列，而不是彼此相邻。我希望它更短。此外，有时我可能不知道确切的专栏信。

所以我想知道是否有办法知道列名或标题。就像最顶行的值一样。因此，如果它位于指定的列中，我可以测试以查看它是否始终在该单元格上执行函数。我找不到openpyxl函数来做列名。不确定它是否理解第一行与休息不同。我想也许如果不是我可以尝试在第一行做测试，但不明白如何做到这一点。

那么有没有办法调用列名？或者如果没有办法调用列名进行测试，有人可以帮我做第一行检查，看它是否有价值？然后改变正确的行我在？这是否有意义。

所以不是代码说：

if cellObj.column == 'H' or ...

它会说：

if cellObj.column_header == 'NameOfField or ...

如果不可能那样做，那么：

if this cell has column where first row value is 'NameOfField' ...

请帮助以最佳方式执行此操作。我已经查看了stackoverflow以及书籍和博客网站，但似乎没有办法调用列名（不是列的字母）。

for row in sheet.iter_rows():
 for cellObj in row:
    if cellObj.column == 'H' or cellObj.column == 'I' or cellObj.column == 'L' or cellObj.column == 'M':
        print(cellObj.value),
        if cellObj.value.upper() == 'OldValue1':
            cellObj.value = 1
            print(cellObj.value)
        elif cellObj.value.upper() == 'OldValue2':
            cellObj.value = 2
            print(cellObj.value)

Answer 1

修改

假设这些是您要查找的标题名称：

colnames = ['Header1', 'Header2', 'Header3']

找到这些列的索引：

col_indices = {n for n, cell in enumerate(sheet.rows[0]) if cell.value in colnames}

现在迭代剩余的行：

for row in sheet.rows[1:]: for index, cell in enumerate(row): if index in col_indices: if cell.value.upper() == 'OldValue1': cell.value = 1 print(cell.value) elif cell.value.upper() == 'OldValue2': cell.value = 2 print(cell.value)

使用字典而不是集合来保持列名称：

col_indices = {n: cell.value for n, cell in enumerate(sheet.rows[0]) if cell.value in colnames} for row in sheet.rows[1:]: for index, cell in enumerate(row): if index in col_indices: print('col: {}, row: {}, content: {}'.format( col_indices[index], index, cell.value)) if cell.value.upper() == 'OldValue1': cell.value = 1 elif cell.value.upper() == 'OldValue2': cell.value = 2

旧答案

这使您的if语句更短：

if cellObj.column in 'HILM': print(cellObj.value),

对于多字母列坐标，您需要使用列表：

if cellObj.column in ['H', 'AA', 'AB', 'AD']: print(cellObj.value),

Answer 2

您可以使用sheet.cell（row =＃，column =＃）语法从第一行和第一列访问单元格。例如：

char buffer[100];
sprintf_s(buffer, "check it out: %s\n", "I can inject things");
OutputDebugStringA(buffer);

Answer 3

由于row返回一个生成器，因此您可以在第一次迭代中轻松提取标头，根据需要对其进行处理，然后继续使用它。例如：

headers = [cell.value for cell in next(sheet.rows)]
# find indexes of targeted columns
cols = [headers.index(header) for header in 'HILM']

conv = {'OldValue1': 1, 'OldValue2': 2}

for row in sheet.rows:
    values = [cell.value for cell in row]
    for col in cols:
        values[col] = conv[values[col]]

Answer 4

您有很多方法可以做到这一点。我使用的一些方法：

1。蛮力

假设定义了“工作表”和“工作簿”。

header = [cell for cell in sheet['A1:XFD1'][0] if cell.value is not None and cell.value.strip() != ''] #you get all non-null columns target_values = ['NameOfField', 'NameOfField1', 'NameOfField2'] #filter list target_header = [cell.column for cell in header if cell.value in target_values] #get column index data = {'OldValue1': 1, 'OldValue2': 2} for row in sheet.iter_rows(max_row=sheet.max_row, max_col=sheet.max_column): for cell in row: if cell.column in target_header and cell.value in data : cell.value = data[cell.value]

在这种情况下，蛮力在“ sheet ['A1：XFD1']”中。我们必须在第一时间检查所有列。但是您将获得列的所有单元格引用。之后，我们创建 target_values （我们的列名...），并创建一个带有列索引（ target_header ）的列表。最后，我们遍历了工作表。我们检查单元格的列是否在列索引中，并检查该单元格的值是否在数据中，因此我们可以更改该值。

缺点：如果在“数据区域”之外存在带有随机空格的单元格。 max_row和max_column 将考虑该单元格（在空白单元格上重复）。

2。检查绑定

如果数据具有表格形式（列之间没有空格，列中带有“ id”->而不是null，不是空白）的，则可以使用自己的max row和max column。 >

from openpyxl.utils import get_column_letter 

def find_limit_sheet(direction):
    max_limit_value = 1
    while (direction(max_limit_value).value is not None) and (direction(max_limit_value).value.strip() != ''):
        max_limit_value = max_limit_value + 1
    return (max_limit_value - 1) if max_limit_value != 1 else 1


max_qrow = find_limit_sheet(direction=lambda increment: sheet.cell(row=increment, column=1))
max_qcolumn = find_limit_sheet(direction=lambda increment: sheet.cell(column=increment, row=1))

header = [cell for cell in sheet[f'A1:{get_column_letter(max_qcolumn)}1']] #you get all non-null columns
target_values = ['NameOfField', 'NameOfField1', 'NameOfField2'] #filter list
target_header = [cell.column for cell in header[0] if cell.value in target_values] #get column names

data = {'OldValue1': 1, 'OldValue2': 2}

for row in sheet.iter_rows(max_row=max_qrow, max_col=max_qcolumn):
 for cell in row:
     if cell.column in target_header and cell.value in data :
         cell.value = data[cell.value]

在这种情况下，我们仅在“数据区域”内。

3。可选：使用熊猫

如果您需要对excel数据进行更复杂的操作（我必须在我的工作中读很多excel ：（作为数据源）。我更喜欢转换为pandas dataframe-> make操作->保存结果。

在这种情况下，我们使用所有数据。

from openpyxl.utils import get_column_letter 
import pandas as pd

def find_limit_sheet(direction):
    max_limit_value = 1
    while (direction(max_limit_value).value is not None) and (direction(max_limit_value).value.strip() != ''):
        max_limit_value = max_limit_value + 1
    return (max_limit_value - 1) if max_limit_value != 1 else 1


max_qrow = find_limit_sheet(direction=lambda increment: sheet.cell(row=increment, column=1))
max_qcolumn = find_limit_sheet(direction=lambda increment: sheet.cell(column=increment, row=1))

header = [cell.value for cell in sheet[f'A1:{get_column_letter(max_qcolumn)}1'][0]] #you get all non-null columns
raw_data = []
for row in sheet.iter_rows(max_row=max_qrow, max_col=max_qcolumn):
    row_data = [cell.value for cell in row]
    raw_data.append(dict(zip(header, row_data)))

df = pandas.DataFrame(raw_data)
df.columns = df.iloc[0]
df = df[1:]

例如，您还可以使用 target_data 使用列的子集。

...
target_header = [cell.column for cell in header[0] if cell.value in target_values] #get column names
...
raw_data = []
for row in sheet.iter_rows(max_row=max_qrow, max_col=max_qcolumn):
    row_data = [cell.value for cell in row if cell.column in target_header]
    raw_data.append(dict(zip(header, row_data)))

df = pd.DataFrame(raw_data)
df.columns = df.iloc[0]
df = df[1:]
...

信息

openpyxl：2.6.2
熊猫：0.24.2
python：3.7.3
数据结构：List Comprehensions doc
lambda expr：lambda expression

如何在openpyxl中使用字段名称或列标题？

4 个答案: