Question

我正在使用以下代码对文字进行评分

import textstat
import pandas as pd

test_data = ("""Jonathan pushed back the big iron pot and stood up.
There were no bears. But up the path came his father, carrying his gun. And with
him were Jonathan's Uncle James and his Uncle Samuel, his Uncle John and his
Uncle Peter. Jonathan had never in all his life been so glad to see the uncles.
"Jonathan!'" said his father, "what a fright you have given us! Where have you
been all this time?" """)

textstat.flesch_reading_ease(test_data)

获得100.48分（非常容易阅读）

我有一个带有“标题”和“文本”列的csv。我想遍历每一行，并在“文本”列的每个单元格上使用textstat.flesch_reading_ease函数。

但是，我似乎无法正确解决这个问题。

import textstat 
import pandas as pd
csv = pd.read_csv('my_list_of_texts.csv')


for i, j in csv.iterrows():
     a = textstat.flesch_reading_ease(j)
     print(a)

这给了我错误 TypeError：“系列”对象是可变的，因此不能被散列

Answer 1

在此处应用是在熊猫DataFrame和Series中完成这项工作。

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

您可以在此处找到一些示例：

https://chrisalbon.com/python/data_wrangling/pandas_apply_operations_to_dataframes/

Answer 2

使用Series.apply：

csv['Text'].apply(textstat.flesch_reading_ease)

Answer 3

我通过将textstat.flesch_reading_ease放在它自己的函数中并从系列中删除NaN来解决了这个问题。

def readability_score(text):
    s = textstat.automated_readability_index(text)
    return s

csv = csv.dropna()

如何遍历Pandas DataFrame并在每个单元格上调用函数

3 个答案: