TF-IDF的文档向量

时间:2019-10-09 22:11:30

标签: information-retrieval tf-idf

我正在阅读 David Grossman和Ophir Frieder Information Retrieval 书,而我在理解文档向量方面遇到了困难。

以这本书为例,我有3个文档,即

dl = "Shipment of gold damaged in a fire"

d2 = "Delivery of silver arrived in a silver truck"

d3 = "Shipment of gold arrived in a truck"

我已经为文档计算了TFIDFTF-IDF。对于d1,我的TF的计算公式为:

{'a': 0.14286, 'arrived': 0.0, 'damaged': 0.14286, 'delivery': 0.0, 'fire': 0.14286, 'gold': 0.14286, 'in': 0.14286, 'of': 0.14286, 'shipment': 0.14286, 'silver': 0.0, 'truck': 0.0}

而我的TF_IDF{'a': 0.0, 'arrived': 0.0, 'damaged': 0.06816, 'delivery': 0.0, 'fire': 0.06816, 'gold': 0.02516, 'in': 0.0, 'of': 0.0, 'shipment': 0.02516, 'silver': 0.0, 'truck': 0.0}

如何构造文档向量?我似乎找不到办法。 Document Vectors Table (Book)

0 个答案:

没有答案