我必须通过utf-8编码该json文件并使用生成器来获取内容。当我试图运行它时,有一个AttributeError
:
Traceback (most recent call last):
File "F:\Files\python\yiyouhome\WordSeg\json_load.py", line 25, in <module>
tags = jieba.analyse.extract_tags(content_seg,topK = top_K, withWeight = False, allowPOS = allow_pos)
File "C:\Users\ThinkPad\AppData\Local\Programs\Python\Python36\lib\site-packages\jieba\analyse\tfidf.py", line 94, in extract_tags
for w in words:
File "C:\Users\ThinkPad\AppData\Local\Programs\Python\Python36\lib\site-packages\jieba\posseg\__init__.py", line 249, in cut
for w in self.__cut_internal(sentence, HMM=HMM):
File "C:\Users\ThinkPad\AppData\Local\Programs\Python\Python36\lib\site-packages\jieba\posseg\__init__.py", line 217, in __cut_internal
sentence = strdecode(sentence)
File "C:\Users\ThinkPad\AppData\Local\Programs\Python\Python36\lib\site-packages\jieba\_compat.py", line 37, in strdecode
sentence = sentence.decode('utf-8')
AttributeError: 'generator' object has no attribute 'decode'
为什么会这样?
起初:
Traceback (most recent call last):
File "F:\Files\python\yiyouhome\WordSeg\json_load.py", line 10, in <module>
json_data = open('spider_raw.json',encoding = 'gbk').read() #,encoding = 'utf-8'
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa3 in position 74: illegal multibyte sequence
所以我添加encoding ='utf-8'来修复它。
这是我的代码:
import json
import jieba.analyse
import jieba.posseg as pseg
json_data = open('spider_raw.json',encoding = 'utf-8').read()
data = json.loads(json_data)
top_K = 20
allow_pos = ('nr',)
def getcontent(spiderlist):
for k,v in spiderlist.items():
for item in v['talk_mutidetails']:
yield(item['cotent'])
#def getcontenttopic(spiderlist):
item = getcontent(data)
content_seg = pseg.cut(item)
tags = jieba.analyse.extract_tags(content_seg,topK = top_K, withWeight = False, allowPOS = allow_pos)
for t in tags:
print(t)