Question

我正在分析一些用LDA提取的文件，我从这里学到了一些关于LDA的基础知识http://blog.echen.me/2011/08/22/introduction-to-latent-dirichlet-allocation/

我有三个文件：

主题ID（5x600）

 59673453648    64345309472 1.23984E+11 52934539940 1.06263E+14
 1.05643+14     1.44524E+11 1.09535E+14 1.06368E+14 62248718804
 1.12535E+14    1.13771E+14 1.70701E+14 1.86305E+14 1.9114E+14


  Topic names (5x600)
  TBBT            The_Mummy    Spider-Man  Inception           Shrek
  Outfitters      Cheerleading  Chanel     Victoria's Secret   LV 
  Pia Mia         Ciara         Usher      Jay-z               Akon


data.df  (600X1100000)

 id 
 1  0.000111111  0.000111111 0.000111111 0.000111111 0.000111111
 2  9.883309999  9.883309999 9.883309999 9.883309999 9.883309999
 3  6.772454300  6.772454300 6.772454300 6.772454300 6.772454300

我假设主题ID与主题名称匹配，但如何解释data.df（600 cols）中的分数？

如何解释用Latent Dirichlet分配提取的数据

0 个答案: