Question

我是Fasttext和NLP的新手。我在法语中有一个语料库csv，其结构如下：

| value | sentence                       | pivot    |
|-------|--------------------------------|----------|
| 1     | My first [sentence]            | sentence |
| 0     | My second [word] in a sentence | word     |
| ..    | ...                            | ...      |

我想知道如何告诉Fasttext处理括号[pivot]之间的枢轴词以建立我的模型，还是他知道要处理哪个词是在Fasttext中内置的功能？我真的很想了解Fasttext的原理！我发现的文档有限。谢谢。

Answer 1

您可以通过这种方式使用 fastText 提取数据透视列的词向量：

!git clone https://github.com/facebookresearch/fastText.git
!cd fastText
!pip install fastText
import fasttext.util
fasttext.util.download_model('fr', if_exists='ignore')  # French
model = fasttext.load_model('cc.en.300.bin')

vectors = []
dataset = pd.read_csv('path to csv file', sep='\t')
for data in dataset.pivot:
    vectors.append(model[data])

https://fasttext.cc/docs/en/crawl-vectors.html

Fasttext：如何使用Fasttext处理语料库？

1 个答案: