应用错误收集

I am using Gensim's LDAMulticore to perform LDA. I have around 28M small documents (around 100 characters each).

I have given workers argument to be 20 but the top shows it using only 4 processes. There are some discussions around it that it might be slow in reading corpus like: gensim LdaMulticore not multiprocessing? https://github.com/piskvorky/gensim/issues/288

But both of them uses MmCorpus . Although my corpus is completely in memory. I have machine with very large RAM (250 GB) and loading the corpus in memory takes around 40 GB. But even after that LDAMulticore is using just 4 processes. I created the corpus as:

public SkillDTO(Skill skill) { idSkill = skill.getIdSkill(); name = skill.getName(); levelBezeichnung = skill.getLevelBezeichnung().getBezeichnung(); checked = skill.isChecked(); if (skill.getSkills().size() > 0) { Iterator<Skill> iteratorSkill = skill.getSkills().iterator(); while (iteratorSkill.hasNext()) { Skill tempSkill = iteratorSkill.next(); skills.add(convertSkillsToProfileDTO(tempSkill)); } } } private SkillDTO convertSkillsToProfileDTO(Skill skill) { return new SkillDTO(skill); }

I am not able to understand what can be the limiting factor here?

Gensim LdaMulticore is not multiprocessing properly (using just 4 workers)

1 个答案: