Question

我正在使用来自医生的文本数据来做一些NLP，只是试图做一些基本的预处理文本清理，试图去掉停用词和标点符号。我已经给程序列出了标点符号和停用词。

我的文本数据看起来像这样：

“细胞周期蛋白依赖性激酶（CDK）调节多种基本的细胞过程。CDK10是最后一个孤立的CDK之一，其尚未鉴定出活化的细胞周期蛋白，也未揭示激酶活性。先前的研究表明，CDK10沉默增强ETS2（v-ets成红细胞病病毒E26癌基因同源物2）驱动的MAPK途径的激活，从而赋予他莫昔芬对乳腺癌细胞的抗性”

然后我的代码如下：

import string

# Create a function to remove punctuations
def remove_punctuation(sentence: str) -> str:
    return sentence.translate(str.maketrans('', '', string.punctuation))

# Create a function to remove stop words
def remove_stop_words(x):
    x = ' '.join([i for i in x.split(' ') if i not in stop])
return x

# Create a function to lowercase the words
def to_lower(x):
    return x.lower()

因此，我尝试将这些功能应用于“文本”列

train['Text'] = train['Text'].apply(remove_punctuation)
train['Text'] = train['Text'].apply(remove_stop_words)
train['Text'] = train['Text'].apply(lower)

然后我收到一条错误消息：

-------------------------------------------------- ---------------------------- AttributeError Traceback（最近一次调用     最后）     ----> 1列火车['文字'] =火车['文字'] .apply（remove_punctuation）           2火车['文字'] =火车['文字'] .apply（remove_stop_words）           3 train ['Text'] = train ['Text']。apply（下）


/opt/conda/lib/python3.6/site-packages/pandas/core/series.py在   apply（self，func，convert_dtype，args，** kwds）3192
  否则：3193个值= self.astype（object）.values   -> 3194映射= lib.map_infer（值，f，convert = convert_dtype）3195 3196如果len（被映射）和   isinstance（映射[0]，系列）：

pandas._libs.lib.map_infer（）中的pandas / _libs / src / inference.pyx

在remove_punctuation中（句子）         3＃创建一个删除标点符号的函数         4 def remove_punctuation（句子：str）-> str：   ----> 5个返回语句.translate（str.maketrans（''，``，string。标点符号））         6         7＃创建一个删除停用词的功能

AttributeError：“ float”对象没有属性“ translate”

为什么会出现此错误。我猜是因为文本中出现了数字？

AttributeError：“ float”对象没有属性“ translate” Python

0 个答案: