我已使用熊猫将CSV文件导入到Python。该文件包括3列和498行。我只需要为名为“说明”的1列计算字数。我已经通过将“说明”列转换为小写字母,删除了英语停用词和拆分来清理了文件。
IN :
import pandas as pd
df = pd.read_csv("capex_motscles.csv")
from nltk.corpus import stopwords
stop = stopwords.words('english')
Description3 = df['Description'].str.lower().apply(lambda x:
''.join([word for word in str(x).split() if word not in (stop)]))
print(Description3)
OUT :
0 crazy mind california medical service data base...
1 california licensed producer recreational & medic...
2 silicon valley data clients live beyond status...
3 mycrazynotes inc. announces $144.6 million expans...
4 leading provider sustainable energy company prod ...
5 livefreecompany founded 2005, listed new york stock...
我从“ print(Description3)”提供了5行。我总共有498行,并且如上所述,我需要计算单词频率。 任何帮助将不胜感激,谢谢您的时间!
答案 0 :(得分:1)
你的意思是这样吗?
df['Description3'] = df['Description'].str.lower().apply(lambda x:
''.join([word for word in str(x).split() if word not in (stop)]))
df['Description3'].str.split(expand=True).stack().value_counts()