Question

我必须通过utf-8编码该json文件并使用生成器来获取内容。当我试图运行它时，有一个AttributeError：

   Traceback (most recent call last):
  File "F:\Files\python\yiyouhome\WordSeg\json_load.py", line 25, in <module>
    tags = jieba.analyse.extract_tags(content_seg,topK = top_K, withWeight = False, allowPOS = allow_pos)
  File "C:\Users\ThinkPad\AppData\Local\Programs\Python\Python36\lib\site-packages\jieba\analyse\tfidf.py", line 94, in extract_tags
    for w in words:
  File "C:\Users\ThinkPad\AppData\Local\Programs\Python\Python36\lib\site-packages\jieba\posseg\__init__.py", line 249, in cut
    for w in self.__cut_internal(sentence, HMM=HMM):
  File "C:\Users\ThinkPad\AppData\Local\Programs\Python\Python36\lib\site-packages\jieba\posseg\__init__.py", line 217, in __cut_internal
    sentence = strdecode(sentence)
  File "C:\Users\ThinkPad\AppData\Local\Programs\Python\Python36\lib\site-packages\jieba\_compat.py", line 37, in strdecode
    sentence = sentence.decode('utf-8')
AttributeError: 'generator' object has no attribute 'decode'

为什么会这样？

起初：

Traceback (most recent call last):
  File "F:\Files\python\yiyouhome\WordSeg\json_load.py", line 10, in <module>
    json_data = open('spider_raw.json',encoding = 'gbk').read() #,encoding = 'utf-8'
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa3 in position 74: illegal multibyte sequence

所以我添加encoding ='utf-8'来修复它。

这是我的代码：

import json

import jieba.analyse

import jieba.posseg as pseg

json_data = open('spider_raw.json',encoding = 'utf-8').read()

data = json.loads(json_data)

top_K = 20

allow_pos = ('nr',)
def getcontent(spiderlist):
   for k,v in spiderlist.items():
      for item in v['talk_mutidetails']:
         yield(item['cotent'])
#def getcontenttopic(spiderlist):

item = getcontent(data)
content_seg = pseg.cut(item)
tags = jieba.analyse.extract_tags(content_seg,topK = top_K, withWeight = False, allowPOS = allow_pos)

for t in tags:
   print(t)

jieba.analyse：'generator'对象没有属性'decode'

0 个答案: