Question

问题：

我正在按照一个教程进行操作，并尝试对包含tweet（日期，用户名，tweet本身，tweet ID以及它是真还是假）的csv文件进行重新搜索。

这是我的原始代码：

import pandas as pd
import re

filename = 'sample.csv'
data = pd.read_csv(filename, encoding='utf-8')

print(data.info())

def word_in_text(word,text):
     match = re.search(word,text)

     if match:
         return True
     return False

[kai, hatsu] = [0, 0]

for index, row in data.iterrows():
    kai += word_in_text('会', row['text'])
    hatsu += word_in_text('初', row['text'])

这是它引发的错误：

Traceback (most recent call last):
File "C:\Python\enkousaiTF.py", line 28, in <module>
kai += word_in_text('会', row['text'])
File "C:\Python\enkousaiTF.py", line 19, in word_in_text
match = re.search(word,text)
File "C:\Python\Python36-32\lib\re.py", line 182, in search
return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object

我试图解决的问题：

当我试图找出数据框的类型时，我得到了：

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1001 entries, 0 to 1000
Data columns (total 5 columns):
date        1000 non-null object
username    1000 non-null object
text        1000 non-null object
id          1000 non-null float64
enko        1000 non-null object
dtypes: float64(1), object(4)
memory usage: 23.5+ KB

所以，我认为问题可能出在float64类型上，所以我尝试在此处添加str：

match = re.search(str(word,text))

但这只会引发另一个错误：

TypeError: decoding str is not supported

然后我尝试使用

更改数据类型

dtype_dic= {'date': str, 
            'username' : str,
            'text': str,
            'id': str,
            'enko': str}

但是，即使我检查了数据类型，它仍然会抛出TypeError: expected string or bytes-like object

如何解决此问题？

Answer 1

很可能您的文本文件不支持unicode。选中此link并检查格式。

python pandas re：search抛出错误：预期的字符串或类似字节的对象

1 个答案: