Question

对于以下数据框：

index      sentences                                            category
1          the side effects are terrible !                         SSRI
2          They are killing me,,, I want to stop                   SNRI
3          I need to contact my physicians ?                        SSRI
4          How to stop it.. I am surprised because of its effect.   SSRI
5                                                                   SSRI
6                    NAN                                            SNRI

我试图将句子列中的句子标记化。句子列有一些空值。这是我的代码，但它不起作用。

df["sentences"] = df.sentences.replace (r'[^a-zA-Z]', '', regex= True, inplace = True)

df["tokenized_sents"] = df["sentences"].apply(nltk.word_tokenize)

我试过这个：

df["sentences"] = df.sentences.replace (r'[^a-zA-Z]', 'null', regex= True, inplace = True)

它会产生以下错误：

expected string or bytes-like object

有什么建议吗？

Answer 1

   index                                          sentences category
0      1                    the side effects are terrible !     SSRI
1      2              They are killing me,,, I want to stop     SNRI
2      3                  I need to contact my physicians ?     SSRI
3      4  How to stop it.. I am surprised because of its...     SSRI
4      5                                                NaN     SNRI
5      5                                               None     None

首先打印，

   index                                          sentences category  \
0      1                    the side effects are terrible !     SSRI   
1      2              They are killing me,,, I want to stop     SNRI   
2      3                  I need to contact my physicians ?     SSRI   
3      4  How to stop it.. I am surprised because of its...     SSRI   
4      5                                                NaN     SNRI   
5      5                                               None     None   

                                     tokenized_sents  
0             [the, side, effects, are, terrible, !]  
1  [They, are, killing, me, ,, ,, ,, I, want, to,...  
2          [I, need, to, contact, my, physicians, ?]  
3  [How, to, stop, it.., I, am, surprised, becaus...  
4                                                 []  
5                                                 []

第二次印刷，

inplace=True

顺便说一句，如果您明确使用了df.sentences.replace(r'[^a-zA-Z]', '', regex=True, inplace=True) # instead of, df["sentences"] = df.sentences.replace(r'[^a-zA-Z]', '', regex=True, inplace=True)，则不必再将其分配给原来的df。

$abc="SELECT count(*) as c FROM output WHERE question1=4";
$result=mysqli_query($conn,$abc);
if($result)
 {
    while($row=mysqli_fetch_assoc($result))
  {
        echo $row['c'];
  }     
 }

使用nltk.word_tokenize在pandas数据框中生成错误“期望的字符串或类似字节的对象”

1 个答案: