Question

我尝试从文本文件中读取数据。我可以通过导入来做到这一点。它工作正常。我的数据导入为：用户ID | SportID |评级

有很多用户可以喜欢任何评级的运动，例如：

User|SportID|Rating
1      2       10
1      3        5
2      1       10
2      3        2

我尝试创建一个新的矩阵，如下所示

UserID  Sport1  Sport2  Sport3
 1      (null)    10      5
 2        10    (null)    2

我通过“for”和“loop”尝试了这个，但是有近2000个用户和1000个体育项目，他们的数据几乎是100000.我怎么能这样做？

Answer 1

要快速执行此操作，您可以使用一维UserID和另一个Sports的稀疏矩阵。稀疏矩阵将表现为大多数事物，如普通矩阵。像这样构造它

out = sparse(User, SportID, Rating)

其中User，SportID和Rating是与文本文件列对应的向量。

注意1：对于User和SportID的重复，Rating将被加总。

注2：在问题中写为(null)的空条目不存储在稀疏矩阵中，只存储在非零的矩阵中（即稀疏矩阵的主要点）。

Answer 2

您可以执行以下操作：

% Test Input
inputVar = [1 2 10; 1 3 5; 2 1 10; 2 3 2]; 

% Determine number of users, and sports to create the new table
numSports = max(inputVar(1:end,2));
numUsers = max(inputVar(1:end,1));
newTable = NaN(numUsers, numSports);

% Iterate for each row of the new table (# of users)
for ii = 1:numUsers
    % Determine where the user rated from input mat, which sport he/she rated, and the rating
    userRating = find(inputVar(1:end,1) == ii);
    sportIndex = inputVar(userRating, 2)';
    sportRating = inputVar(userRating, 3)';
    newTable(ii, sportIndex) = sportRating; % Crete the new table based on the ratings.
end

newTable

其中产生了以下内容：

newTable =

   NaN    10     5
    10   NaN     2

这只需要为输入表中的用户数运行。

Answer 3

我想您已经将null定义为简化数字。

Null = -1; % or any other value which could not be a rating.

考虑到：

nSports = 1000; % Number of sports
nUsers = 2000; % Number of users

预先分配结果：

Rating_Mat = ones(nUsers, nSports) * Null; % Pre-allocation

然后使用sub2ind（类似于this answer）：

Rating_Mat (sub2ind([nUsers nSports], User, SportID) = Rating;

或accumarray：

Rating_Mat = accumarray([User, SportID], Rating);

假设User和SportID为Nx1。

希望它有所帮助。

Matlab处理来自文本文件的数据

3 个答案: