Question

我一直在努力设计一个python代码，用于在excel文件中搜索“N”个单词。在任何'N'字存在的地方，python代码应输出存在这些字的整行。我正在excel文件中搜索多个单词出现。

假设这种类型的excel文件（比如称为File.xlsx）：

ID    Date        Time      Comment
123   12/23/2017  11:10:02 Trouble with pin
98y   01/17/2016  12:45:01 Great web experience. But I had some issues.
76H   05/39/2017  09:55:59 Could not log into the portal.

根据以上数据，问题是：
如果我要搜索单词'pin'和'log'并在上面的excel文件中找到它，我希望我的python代码输出line1并在它下面输出line3。

从概念上讲，我可以想办法解决这个问题，但Python实现让我感到困惑。此外，我在Stack Overflow中进行了大量搜索，但找不到解决此问题的帖子。

非常感谢任何和所有帮助。

Answer 1

有许多方法可以实现这一点，因为有许多python包可以读取Excel文件（http://www.python-excel.org/），但xlrd可能是最直接的方式：

line 1
line 3

输出：

https://login.microsoftonline.com/{tenant}/oauth2/authorize?client_id={client_id}&response_type=code&redirect_uri=http%3A%2F%2Flocalhost%2F&response_mode=query&resource=https%3A%2F%2Fapi.timeseries.azure.com%2F&state=12345

Answer 2

这是一个使用openpyxl模块的解决方案，我已成功用于许多项目。

行索引从包含标题的行开始，因此如果您不想计算标题，我们需要将索引计数减少1 row - 1

from openpyxl import load_workbook

wb = load_workbook(filename = 'afile.xlsx')
ws = wb.active
search_words = ['pin' , 'log']

for row in xrange(1,ws.max_row + 1):
    for col in xrange(1,ws.max_column + 1):
        _cell = ws.cell(row=row, column=col)
        if any(word in str(_cell.value) for word in search_words):
            print "line {}".format(row - 1)
            break
>>> 
line 1
line 3

如果要输出实际线条只需添加以下print_row功能

即可

def print_row(row):
    line = ''
    for col in xrange(1,ws.max_column + 1):
        _cell = ws.cell(row=row, column=col).value
        if _cell:
            line += ' ' + str(_cell)
    return line

将print "line {}".format(row - 1)替换为print print_row(row)

>>> 
 123 2017-12-23 00:00:00 11:10:02 Trouble with pin
 76H 05/39/2017 09:55:59 Could not log into the portal.
>>>

按特定单词过滤Excel文件中的行

2 个答案: