Question

我有一个带有the following pattern的Excel文件（data.xlsx），我需要从该Excel文件中的文本框中读取一个值。

我目前正在使用pandas库，我试图获取该值，但遗憾的是找不到该目标的任何API。

有谁知道如何做到这一点？

更多信息：

我的问题是来自Java的this姐妹问题的副本。

修改：

我为那些想知道如何在excel文件中手动搜索形状（可能还有其他所有项目）的人提供了一个解决方案（也就是说，没有来自pip的外部模块）。它实际上非常简单。看我的评论。

Answer 1

感谢所有帮助，但我自己解决了这个问题。

我使用 zipfile module 让它工作。显然，Excel is actually a suite that works on compressed XML files (changing the *.xlsx to *.zip reveals the contents of the file) when saving and reading from *.xlsx，所以我可以轻松地使用XML来搜索所需的文本。

这是我制作的模块。通过调用Sheet('path/to/sheet.xlsx').shapes.text，您现在可以轻松找到文本框中的文本：

import zipfile as z


class Sheet(str):
    @property
    def shapes(this):
        s = z.ZipFile(this)
        p='xl/drawings/drawing1.xml'  # shapes path, *.xlsx default
        p='drs/shapexml.xml'  # shapes path, *.xls default
        return XML(s.read(p))


class XML(object):
    def __init__(self, value):
        self.value = str(value)

    def __repr__(self):
        return repr(self.value)

    def __getitem__(self, i):
        return self.value[i]

    def tag_content(self, tag):
        return [XML(i) for i in self.value.split(tag)[1::2]]

    @property
    def text(self):
        t = self.tag_content('xdr:txBody')  # list of XML codes, each containing a seperate textboxes, messy (with extra xml that is)
        l = [i.tag_content('a:p>') for i in t]  # split into sublists by line breaks (inside the textbox), messy
        w = [[[h[1:-2] for h in i.tag_content('a:t')] if i else ['\n'] for i in j] for j in l]  # clean into sublists by cell-by-cell basis (and mind empty lines)
        l = [[''.join(i) for i in j] for j in w]  #  join lines overlapping multiple cells into one sublist
        return ['\n'.join(j) for j in l]  #  join sublists of lines into strings seperated by newline char

所以现在我的问题中提供的模式将输出为['comments extra']，而模式如下：

此   是

文本   在   一个   文本框   在

一   表
而且这个   是其他地方的另一个文本框

无论重叠的细胞如何

将输出为['This is\nText in a textbox on\na sheet','And this is another text box somewhere else\nRegardless of the overlapped cells']。

欢迎您。

Answer 2

您可以使用Dispatch：

from win32com.client import Dispatch

xl = Dispatch('Excel.Application')
wb = xl.Workbooks.Open(Filename = 'your file name/path')
ws = wb.Worksheets(sheet_index)

其中 sheet_index 是与工作簿中的关注工作表相对应的任何数字。 w.Shapes将在图纸上具有所有形状对象。您可以使用整数 index ，Shapes（index）访问形状（文本框），然后使用名称属性Shapes（index）.Name检查对象的名称。

ws.Shapes(index).Name

一旦确定了想要的形状，就可以像这样查看其文本：

ws.Shapes(index).Characters().Text

请注意，您必须调用（）Characters方法。要分配文本，只需分配它。或者，您也可以使用标准的替换方法替换其中的一部分（例如日期）。

ws.Shapes(index).Characters().Text = 'Beluga Whales'

ws.Shapes(index).TextFrame.Characters().Text = ws.Shapes(index).TextFrame.Characters().Text.replace('original text', 'new text')

Answer 3

使用openpyxl（版本2.4）

目前无法做到这一点

Answer 4

我试过这个从文本框中获取价值。

    xls = ExcelFile(request.FILES['yourFileName'])
    df = xls.parse(xls.sheet_names[0])
    for i in df.values:
        print(i[0])               #here you get the value from text box

感谢

在Python中从Excel获取文本框值

4 个答案: