NLTK WordNetLemmatizer中的多线程?

时间:2018-05-30 18:17:18

标签: python multithreading python-3.x nltk wordnet

我正在尝试使用多线程来加速这个过程。我使用wordnetlemmatizer来对词进行词形推理,sentiwordnet可以进一步使用这些词来计算文本的情绪。我正在使用WordNetLemmatizer的My Sentiment分析功能如下:

import nltk
from nltk.corpus import sentiwordnet as swn

def SentimentA(doc, file_path):
    sentences = nltk.sent_tokenize(doc)
    # print(sentences)
    stokens = [nltk.word_tokenize(sent) for sent in sentences]
    taggedlist = []
    for stoken in stokens:
        taggedlist.append(nltk.pos_tag(stoken))
    wnl = nltk.WordNetLemmatizer()
    score_list = []
    for idx, taggedsent in enumerate(taggedlist):
        score_list.append([])
        for idx2, t in enumerate(taggedsent):
            newtag = ''
            lemmatized = wnl.lemmatize(t[0])
            if t[1].startswith('NN'):
                newtag = 'n'
            elif t[1].startswith('JJ'):
                newtag = 'a'
            elif t[1].startswith('V'):
                newtag = 'v'
            elif t[1].startswith('R'):
                newtag = 'r'
            else:
                newtag = ''
            if (newtag != ''):
                synsets = list(swn.senti_synsets(lemmatized, newtag))

                score = 0
                if (len(synsets) > 0):
                    for syn in synsets:
                        score += syn.pos_score() - syn.neg_score()
                    score_list[idx].append(score / len(synsets))
    return SentiCal(score_list)

运行4个线程后,我收到前3个线程的以下错误,最后一个线程正常工作。

AttributeError: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'

我已尝试在本地导入NLTK包,如此NLTK issue中所示 并尝试了这个page给出的解决方案。

1 个答案:

答案 0 :(得分:1)

快速入侵:

import java.util.List;
import javax.inject.Inject;
import javax.servlet.http.HttpServletRequest;
import javax.ws.rs.Consumes;
import javax.ws.rs.DELETE;
import javax.ws.rs.GET;
import javax.ws.rs.POST;
import javax.ws.rs.PUT;
import javax.ws.rs.Path;
import javax.ws.rs.PathParam;
import javax.ws.rs.Produces;
import javax.ws.rs.core.Context;
import javax.ws.rs.core.MediaType;
import javax.ws.rs.core.Response;

/**
 * Describes the RESTful access for reports.
 */
@Path("/report")
@Produces(MediaType.APPLICATION_JSON)
@Consumes(MediaType.APPLICATION_JSON)
public class ReportResource {
    @Inject
    private Logger logger;

    @GET
    @Path("/single/{reportId}")
    public Response getReport(@PathParam("reportId") String reportId) {
        //return Mock.getReport(reportId);
        return Response.ok() // 200
                       .entity(Mock.getReport(reportId))
                       .header("Access-Control-Allow-Origin", "*")
                       .header("Access-Control-Allow-Methods", "GET, POST, DELETE, PUT")
                       .allow("OPTIONS").build();
    }
...
}

稍后会详细介绍......还在输入