如何将df.loc []应用于多行并应用转换?

时间:2019-12-30 13:33:06

标签: python pandas

我正在尝试将转换应用于df [“ Text_str”]中的所有行,以便可以利用填充函数。

到目前为止,我可以通过执行df.loc [i,“ Text_str”]进行手动测试,但是我需要浏览一些文本行并将结果附加到df。

如何将df.loc [i,“ Text_str”]转换为函数,或者更好地将其应用于填充函数?

# percentage of words that are considered stopwords
def padding(text):
    #from nltk.corpus import stopwords
    nltk.download('stopwords')
    stopwords = nltk.corpus.stopwords.words('english')
    text = re.findall('[A-z]+', text)
    content = [w for w in text if w.lower() in stopwords] # you can calculate %stopwords using "in"
    return round(float(len(content)) / len(text), 2)

test_data = df.loc[1, "Text_str"]

print(padding(test_data))

错误:

TypeError: expected string or bytes-like object

df [“ Text_str”] 1行的示例

0    Parker-Hannifin Corp. (NYSE: Q2 2016 Earnings Call January 26, 2016 11:00 am ET Executives Jon P. Marten - Executive Vice President-Finance & Adminstration and Chief Financial Officer Thomas L. Williams - Chairman & Chief Executive Officer Lee C. Banks - President and Chief Operating Officer Analysts James A. Picariello - KeyBanc Capital Markets, Inc. Nicole Deblase - Morgan Stanley & Co. LLC Eli Lustgarten - Longbow Research LLC Andrew M. Casey - Wells Fargo Securities LLC Ann P. Duignan - JPMorgan Securities LLC Jamie L. Cook - Credit Suisse Securities (NYSE: Joseph Alfred Ritchie - Goldman Sachs & Co. Nathan Jones - Stifel, Nicolaus & Co., Inc. Andrew Burris Obin - Bank of America Merrill Lynch Joel Gifford Tiss - BMO Capital Markets (United States) Operator Good day, ladies and gentlemen, and welcome to the Parker-Hannifin Corp. Fiscal 2016 Second Quarter Earnings Conference Call. At this time, all participants are in a listen-only mode. Later, we will conduct a question-and-an...

类型:

<class 'pandas.core.series.Series'>

1 个答案:

答案 0 :(得分:0)

我知道了:

df['Padding'] = [padding(x) for x in list(df.loc[:,'Text_str'])]