Question

我训练了一个gensim.models.doc2vec.Doc2Vec模型
d2v_model = Doc2Vec（句子，大小= 100，窗口= 8，min_count = 5，工人= 4）我可以通过以下方式获取文档向量 docvec = d2v_model.docvecs [0]

如何从训练有素的模型中获取单词向量？

Answer 1

Doc2Vec继承自Word2Vec，因此您可以直接通过索引模型来访问与Word2Vec相同的单词向量：

#pragma once

#define NOMINMAX
#undef max

template <typename T>
class UntrackedAllocator {
public:
    typedef T value_type;
    typedef value_type* pointer;
    typedef const value_type* const_pointer;
    typedef value_type& reference;
    typedef const value_type& const_reference;
    typedef std::size_t size_type;
    typedef std::ptrdiff_t difference_type;

public:
    template<typename U>
    struct rebind {
        typedef UntrackedAllocator<U> other;
    };

public:
    inline explicit UntrackedAllocator() {}
    inline ~UntrackedAllocator() {}
    inline explicit UntrackedAllocator(UntrackedAllocator const&) {}
    template<typename U>
    inline explicit UntrackedAllocator(UntrackedAllocator<U> const&) {}

    //    address
    inline pointer address(reference r) {
        return &r;
    }

    inline const_pointer address(const_reference r) {
        return &r;
    }

    //    memory allocation
    inline pointer allocate(size_type cnt,
        typename std::allocator<void>::const_pointer = 0) {
        T *ptr = (T*)malloc(cnt * sizeof(T));
        return ptr;
    }

    inline void deallocate(pointer p, size_type cnt) {
        free(p);
    }

    //   size
    inline size_type max_size() const {
        return std::numeric_limits<size_type>::max() / sizeof(T);
    }

    // construction/destruction
    inline void construct(pointer p, const T& t) {
        new(p) T(t);
    }

    inline void destroy(pointer p) {
        p->~T();
    }

    inline bool operator==(UntrackedAllocator const& a) { return this == &a; }
    inline bool operator!=(UntrackedAllocator const& a) { return !operator==(a); }
};

但请注意，像纯DBOW（for i = C.length to i = 2 C[i] = C[i] + A[i-1] + B[i-1] if C[i] > 1 C[i-1] = C[i-1] + 1 C[i] = C[i] - 2）这样的Doc2Vec训练模式不需要或创建单词向量。（纯DBOW仍可以很好地用于多种用途！）如果您从这样的模型中访问单词向量，它们将只是自动随机初始化的向量，没有任何意义。

仅当Doc2Vec模式本身共同训练字向量时，如DM模式（默认wv = d2v_model['apple']）或向DBOW（dm=0）添加可选字训练时，才是字向量和doc-vectors都是同时学习的。

Answer 2

如果您想获得所有训练有素的文档载体，则可以轻松使用 model.docvecs.doctag_syn0。如果要获取索引文档，则可以使用model.docvecs[i]。如果您正在训练Word2Vec模型，则可以获得model.wv.syn0。如果您想获得更多，请查看以下github问题链接：（https://github.com/RaRe-Technologies/gensim/issues/1513）

如何从gensim Doc2Vec获取单词向量？

2 个答案: