我有一个如下所示的熊猫数据框。我想将所有文本都转换为小写。如何在python中执行此操作?
数据帧样本
[Nah I don't think he goes to usf, he lives around here though]
[Even my brother is not like to speak with me., They treat me like aids patent.]
[I HAVE A DATE ON SUNDAY WITH WILL!, !]
[As per your request 'Melle Melle (Oru Minnaminunginte Nurungu Vettam)' has been set as your callertune for all Callers., Press *9 to copy your friends Callertune]
[WINNER!!, As a valued network customer you have been selected to receivea £900 prize reward!, To claim call 09061701461., Claim code KL341., Valid 12 hours only.]
我尝试过的
def toLowercase(fullCorpus):
lowerCased = [sentences.lower()for sentences in fullCorpus['sentTokenized']]
return lowerCased
我收到此错误
lowerCased = [sentences.lower()for sentences in fullCorpus['sentTokenized']]
AttributeError: 'list' object has no attribute 'lower'
答案 0 :(得分:1)
很简单:
df.applymap(str.lower)
或
df['col'].apply(str.lower)
df['col'].map(str.lower)
好的,您在行中有列表。然后:
df['col'].map(lambda x: list(map(str.lower, x)))
答案 1 :(得分:1)
您可以尝试使用apply
和map
:
def toLowercase(fullCorpus):
lowerCased = fullCorpus['sentTokenized'].apply(lambda row:list(map(str.lower, row)))
return lowerCased
答案 2 :(得分:1)
也可以将其设置为string
,使用str.lower
并返回列表。
import ast
df.sentTokenized.astype(str).str.lower().transform(ast.literal_eval)
答案 3 :(得分:0)
还有一种不错的方法来使用numpy:
fullCorpus['sentTokenized'] = [np.char.lower(x) for x in fullCorpus['sentTokenized']]