这是我当前的代码:
folder_path1 = os.chdir("C:/Users/xx/Documents/xxx/Test python dict")
words= set()
extracted = set()
for file in os.listdir(folder_path1):
if file.endswith(".xlsx"):
wb = load_workbook(file, data_only=True)
ws = wb.active
words.add(str(ws['A1'].value))
wordsextract = re.match(r"(.*)\((.*)\)", str(words))
extracted.add(str(wordsextract))
print(extracted)
我不确定如何仅提取括号内的单词。因此,我认为可以重新匹配括号以提取括号内的单词。但这不起作用。这里有人知道吗?预先感谢一堆
答案 0 :(得分:1)
将整列读为一组,从每个单元格值中提取单词:
Excel来源:
程序:
from openpyxl import load_workbook
import re
import os
folder_path1 = os.chdir("C:/temp/")
words= set()
extracted = set()
for file in os.listdir(folder_path1):
if file.endswith("m1.xlsx"):
wb = load_workbook(file, data_only=True)
ws = wb.active
# this is A1 though to A5 - yours is only one cell though, you can change the
# min/max to include more columns or rows
# a set makes no sense here - you read only one cell anyhow, so anything in
# it is your single possible value string
# wb.iter_cols(min_col, max_col, min_row, max_row, values_only)
content = set( *ws.iter_cols(1,1,1,5,True)) - {None} # remove empty cells
# non-greedy capturing of things in parenthesis
words = re.findall(r"\((.+?)\)", ' '.join(content), re.DOTALL)
print(words)
输出:
['problem', 'span \nlines', 'some'] # out of order due to set usage
要进行拆分,请执行以下操作:
# same content as above
for cellvalue in content:
if set("()").intersection(cellvalue) == {"(",")"}:
print(cellvalue.split("(")[-1].split(")")[0])
HTH
文档: