Question

以下是该方案：

我们有一个网站，可以让学生创建一个电子档案袋，就像一个个人资料页面，结合您可以添加到其中的项目。

对于每个学生组合，我们将让教育工作者审查投资组合，并根据投资组合的内容给出一组分数。因此，将总计为总分的一组分数与每个学生组合相关联。

因此，我们有得分数据，与投资组合数据相关联，我们希望将此数据用作机器学习算法的监督训练数据。因此，计算机可以检查数千个这些案例，查找模式，提供洞察力并能够预测其他投资组合的分数。

以下是我们为每个人收集的数据：

**Portfolio data:**

About: 'Text paragraph data written by the student about themselves'
Skills: 'Text Bullet list of skills'
Career Interests: 'Text Bullet list of career interests'
Work Experience: 'Text paragraph'
Education History: 'Student fills out Universities, majors, gpa, and dates attended'
Courses: 'Text bullet list of courses'
Interests: 'Text paragraph data written by student about interests'
Works: 'Each student adds works to there portfolio and enter the following data'
   Work Title: 'Text title'
   Attachments: 'File and documents attached to the portfolio (jpg, doc, pdf, youtube, dropbox, etc.)
   Work description: 'Text Description of work'
   category of works: 'Selected from list of categories'
   tags: 'list of test tags student adds to work'
   My contribution: 'Text description of students contribution to project'


**Score data we are collecting for each portfolio, each key area rated from 1-100:**

Content completeness:
Selection of Works:
Reflection:
Academic Concepts:
Presentation and Appearance:
Layout and Readability:
Use of Multimedia:
Audience:
Organization of content:
Written Communication:
TOTAL SCORE:

我们计划随着时间的推移收集数千名学生的作品集和分数。我们可以使用什么样的算法来分析这些数据，以找出获得相似分数的投资组合之间的相关性？然后使用这些数据来预测一旦学生填写完成后投资组合的成功程度。如果其中任何一个令人困惑或者您需要更多信息，请告诉我，非常感谢！

Answer 1

这里有很多问题要解决。

首先想到的是进行特征提取，然后应用回归来预测分数。现在，由于您使用的不仅仅是投资组合中的文本信息，因此您需要的不仅仅是文本功能。我不知道哪些功能可以帮助您将投资组合的“演示和外观”与其分数相关联。一种方法是获取颜色，字体，字体大小信息并将它们表示为特征。为了从文本中获取见解，您可以使用向量空间模型来表示文本。

我会尽快回来写一个详细的答案。如果所有这些听起来都太模糊，我很抱歉。

我们如何在这类数据上使用机器学习算法？

1 个答案: