Question

我是stackoverflow中的新手，最近学到了一些基本的Python。这是我第一次使用openpyxl。在我使用xlrd和xlsxwriter之前，我确实设法制作了一些有用的程序。但是现在我需要一个.xlsx读者和作家。

我需要使用已存储在代码中的数据来读取和编辑文件。假设.xlsx有五列数据：A，B，C，D，E。在A列中，我有超过1000行数据。在D栏上，我有150行数据。

基本上，我希望程序找到包含给定列数据的最后一行（比方说D）。然后，将存储的变量data写入D列的下一个可用行（最后一行+ 1）。

问题是我无法使用ws.get_highest_row()因为它返回A列上的第1000行。

基本上，到目前为止，这就是我所拥有的一切：

data = 'xxx'
from openpyxl import load_workbook
wb = load_workbook('book.xlsx', use_iterators=True)
ws = wb.get_sheet_by_name('Sheet1')
last_row = ws.get_highest_row()

显然这根本不起作用。 last_row返回1000。

Answer 1

问题是get_highest_row() itself uses row dimensions个实例定义了工作表中的最大行。 RowDimension没有关于列的信息 - 这意味着我们无法使用它来解决您的问题，并且必须采用不同的方法。

这是一种＆＃34;丑陋的＆＃34;特定于openpyxl的选项虽然在use_iterators=True：

时不起作用

from openpyxl.utils import coordinate_from_string

def get_maximum_row(ws, column):
    return max(coordinate_from_string(cell)[-1]
               for cell in ws._cells if cell.startswith(column))

用法：

print get_maximum_row(ws, "A")
print get_maximum_row(ws, "B")
print get_maximum_row(ws, "C")
print get_maximum_row(ws, "D")

除此之外，我会按照@LondonRob的建议，用pandas解析内容并让它完成工作。

Answer 2

以下是使用Pandas的方法。

It's easy使用last_valid_index获取Pandas中的最后一个非空行。

可能有更好的方法将结果DataFrame写入xlsx文件，但according to the docs，这种非常愚蠢的方式实际上是如何在openpyxl中完成的。

假设您从这个简单的工作表开始：

Original worksheet

假设我们想将xxx放入C列：

import openpyxl as xl
import pandas as pd

wb = xl.load_workbook('deleteme.xlsx')
ws = wb.get_sheet_by_name('Sheet1')
df = pd.read_excel('deleteme.xlsx')

def replace_first_null(df, col_name, value):
    """
    Replace the first null value in DataFrame df.`col_name`
    with `value`.
    """
    return_df = df.copy()
    idx = list(df.index)
    last_valid = df[col_name].last_valid_index()
    last_valid_row_number = idx.index(last_valid)
    # This next line has mixed number and string indexing
    # but it should be ok, since df is coming from an
    # Excel sheet and should have a consecutive index
    return_df.loc[last_valid_row_number + 1, col_name] = value
    return return_df

def write_df_to_worksheet(ws, df):
    """
    Write the values in df to the worksheet ws in place
    """
    for i, col in enumerate(replaced):
        for j, val in enumerate(replaced[col]):
            if not pd.isnull(val):
                # Python is zero indexed, so add one
                # (plus an extra one to take account
                #  of the header row!)
                ws.cell(row=j + 2, column=i + 1).value = val

# Here's the actual replacing happening
replaced = replace_first_null(df, 'C', 'xxx')
write_df_to_worksheet(ws, df)
wb.save('changed.xlsx')

导致：

Edited Excel file

Answer 3

如果这是openpyxl的限制，那么您可以尝试以下方法之一：

将Excel文件转换为csv并使用Python csv模块。
使用zipfile解压缩Excel文件，然后导航到未压缩文件的“xl / worksheets”子文件夹，在那里您将找到每个工作表的XML。从那里，您可以使用BeautifulSoup或lxml解析和更新。

xslx Excel格式是XML文件的压缩（压缩）树文件夹。您可以找到规范here。

Answer 4

图I＆＃39;开始回馈stackoverflow社区。 Alecxe的解决方案对我不起作用，我也不想使用Pandas等，所以我这样做了。它从电子表格的末尾进行检查，并为您提供D列中的下一个可用/空行。

def unassigned_row_in_column_D(): 
    ws_max_row = int(ws.max_row)
    cell_coord = 'D' + str(ws_max_row)
    while ws.cell(cell_coord).value == None:
        ws_max_row -= 1
        cell_coord = 'D' + str(ws_max_row)
    ws_max_row += 1
    return 'D' + str(ws_max_row)

#then add variable data = 'xxx' to that cell

ws.cell(unassigned_row_in_column_D()).value = data

Answer 5

alexce的解决方案对我没用。这可能是openpyxl版本的问题，我在2.4.1上，这是经过一些小调整之后的工作：

def get_max_row_in_col(ws, column):
    return max([cell[0] for cell in ws._cells if cell[1] == column])

Python查找给定列中的最高行

5 个答案: