在用于监督分类的paper on fasttext中,作者通过改变一些参数来指定不同数量的隐藏单位(h是第3,4页上的那个 - 在表1中你看到"它有10个隐藏单位和我们使用和不使用双字母进行评估。")但在阅读the documentation之后,似乎没有一个隐藏单元"要改变的参数。有没有办法指定隐藏单位的数量?或者这与指定-dim选项相同?
答案 0 :(得分:0)
k
是否定的。类
来自https://arxiv.org/pdf/1607.01759v3.pdf
的第2.1节更确切地说,计算复杂度为O(kh),其中k是类的数量,h是文本表示的维度。
在预测文本分类中的类时,来自docs:
参数k是可选的,默认情况下等于1。 为了获得一段文本的k个最可能的标签,请使用:
$ ./fasttext预测model.bin test.txt k
训练模型时,在使用__label__*
标签执行监督培训时,会在训练数据中隐式指定。
$ wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/cooking.stackexchange.tar.gz && tar xvzf cooking.stackexchange.tar.gz
--2017-05-23 09:03:26-- https://s3-us-west-1.amazonaws.com/fasttext-vectors/cooking.stackexchange.tar.gz
Resolving s3-us-west-1.amazonaws.com... 54.231.236.45
Connecting to s3-us-west-1.amazonaws.com|54.231.236.45|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 457609 (447K) [application/x-gzip]
Saving to: ‘cooking.stackexchange.tar.gz.1’
cooking.stackexchange.tar.gz.1 100%[================================================================>] 446.88K 385KB/s in 1.2s
2017-05-23 09:03:28 (385 KB/s) - ‘cooking.stackexchange.tar.gz.1’ saved [457609/457609]
x cooking.stackexchange.id
x cooking.stackexchange.txt
x readme.txt
$ cat readme.txt
The data in this archive is derived from the user-contributed content on the
Cooking Stack Exchange website (https://cooking.stackexchange.com/), used under
CC-BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0/).
The original data dump can be downloaded from:
https://archive.org/download/stackexchange/cooking.stackexchange.com.7z
and details about the dump obtained from:
https://archive.org/details/stackexchange
We distribute two files, under CC-BY-SA 3.0:
- cooking.stackexchange.txt, which contains all question titles and
their associated tags (one question per line, tags are prefixed by
the string "__label__") ;
- cooking.stackexchange.id, which contains the corresponding row IDs,
from the original data dump.