Question

我有一个Python代码，用于计算文本（.txt）文件中出现的单词：

find_words = re.compile(r'(?:(?<=[^\w./-])|(?<=^))[A-Za-z]+(?:-[A-Za-z]+)*(?=\W|$)').findall
wanted1 = set(find_words(open('word_list_1.csv').read().lower()))
wanted2 = set(find_words(open('word_list_2.csv').read().lower()))
negators = set(find_words(open('negators.csv').read().lower()))
ignore = set(find_words(open('Ignore words.csv').read().lower()))

然后我会执行以下操作来处理文本文件：

with open(csvfile, "wb") as output:
 writer = csv.writer(output)
 for f in glob.glob("*.txt"):
            print "Processing file number : ", i, " out of :", len(glob.glob("*.txt"))
            i=i+1
            with open(f) as inputfile:
                wordNumber=0
                for line in inputfile:
                 if find_words(line.lower()) != []:
                    lineWords=find_words(line.lower())

所以，问题是，如何为excel文件而不是.txt文件执行此操作？我试着做以下事情：

for i in range(0, rows):
  for j in range(0,cols):
    write_sheet1.write(i,j,sheet.cell_value(i,j))
  if sheet.cell_value(i,4)!=0:
    for line in sheet.cell_value(i,4):
 print "Line is : ", line
 if find_words(line.lower()) != []:
    lineWords=find_words(line.lower())

但它不起作用，它只返回一个字符，而不是整行和/或单词......

那么我怎样才能使它适用于excel单元而不是文本文件？

Answer 1

我使用pandas导入Excel文件，然后迭代pandas DataFrame中的所有单元格。

import pandas as pd
df = pd.read_excel(...)
df_out = df.applymap(func)

其中func是获取单元格内容并返回结果的函数。每个单元格的结果都在df_out。

Answer 2

当您阅读文本文件时，Python允许您迭代它，就像它是一个行列表一样。相比之下，电子表格单元格的值（可能）只是一个字符串，因此您可以直接找到其中的单词。

如果单元格可能包含字符串以外的内容（例如数字），则需要先使用for i in range(rows): for j in range(cols): write_sheet1.write(i, j, sheet.cell_value(i, j)) if find_words(sheet.cell_value(i, 4)) != []: cell_words = find_words(sheet.cell_value(i, 4).lower())将其转换为字符串。（我不确定你用什么模块来阅读Excel表格。）

Python - excel - 使用两个csv词典计算单元格中的单词数

2 个答案: