Question

我正在尝试使用Naive Bayes算法进行情感分析，并且正在阅读一些文章。正如几乎每篇文章中都提到的那样，我需要用一些预先计算的情绪训练我的朴素贝叶斯算法。

现在，我有一段使用随NLTK提供的movie_review模块的代码。代码是：

class CMarginDelegate : public QItemDelegate
{
public:
    CMarginDelegate(int margin, QObject* parent)
        :   QItemDelegate(parent),
            m_margin(margin)
    {}

    void paint(QPainter *painter, const QStyleOptionViewItem &option, const QModelIndex &index) const override
    {
        QStyleOptionViewItem itemOption(option);

        // Make the 'drawing rectangle' smaller.
        itemOption.rect.adjust(m_margin, m_margin, -m_margin, -m_margin);

        QItemDelegate::paint(painter, itemOption, index);
    }

private:
    int m_margin;
};

所以，在上面的代码中我有一个training_set和一个testing_set。我查看了movie_review模块，在电影评论模块中，我们有许多包含评论的小文本文件。

所以，我的问题是我们有电影评论模块，我们导入了它并使用模块进行了培训和测试，但是当我使用外部培训数据集和外部测试数据集时，我们该怎么做。
另外，NLTK如何解析其中包含如此多文本文件的movie_review目录。因为我将使用这作为我的训练数据集，所以我需要了解它是如何完成的。

使用数据集进行NLTK培训和测试

0 个答案: