关于推荐系统的准确性

时间:2018-04-17 10:36:37

标签: algorithm machine-learning recommender-systems

嗨,我在这个领域很新鲜。我对推荐算法中的准确度方法感到困惑。据我所知,用户项的评级矩阵很稀疏。我知道我可以使用一些算法来预测缺失值和模型,如RMSE或MAE,但如何计算精度?

对于某些项目,我不知道他们的真实评分。如果我使用0,会增加错误,不是吗?

感谢您的帮助!

3 个答案:

答案 0 :(得分:0)

您只检查您拥有数据的用户的准确性。最好测试不属于培训的验证集数据的准确性。 如果这是回归问题或AUC,log_loss,混淆矩阵等,如果这是分类问题,您可以使用RMSE检查您的准确性。 因此,您只需要获取数据集中标记项的平均误差。

答案 1 :(得分:0)

与其他机器学习模型相比,实际上很难衡量推荐系统的准确性,因为我们没有明确的方法来判断推荐是否准确。但有一种方法可以计算准确度。

您将拥有培训推荐系统的10矩阵。

如果您认为自己拥有20个用户和10 * 20个项目。
您的矩阵将具有维度4 * 4

  

您可以做的是将矩阵划分为训练和测试   数据集。

例如,您可以从10 * 20矩阵的右下端剪切4 * 4子矩阵。在剩余的矩阵上训练推荐系统,然后针对precision切割进行测试。 您将获得预期的输出和系统的输出。使用它们,您可以轻松计算recallF1 scoreCREATE TABLE `users` ( `id` int(11) NOT NULL, `user_type` enum('admin','landlord','renter') NOT NULL, `first_name` varchar(255) NOT NULL, `last_name` varchar(255) NOT NULL, `email_id` varchar(255) NOT NULL, `password` varchar(255) NOT NULL, `company_name` varchar(255) NOT NULL, `street_address` varchar(255) NOT NULL, `city` varchar(100) NOT NULL, `state` varchar(100) NOT NULL, `zip` varchar(100) NOT NULL, `phone_no` varchar(50) NOT NULL, `bank_name` varchar(100) NOT NULL, `billing_address` varchar(255) NOT NULL, `name_on_card` varchar(255) NOT NULL, `bank_account_number` varchar(100) NOT NULL, `bank_routing_no` varchar(100) NOT NULL, `has_payzang_account` int(11) NOT NULL DEFAULT '0', `payzang_api` varchar(255) DEFAULT '', `payzang_gateway_url` varchar(255) NOT NULL, `pending_amount` decimal(18,2) DEFAULT NULL COMMENT 'pending key bank(rent deduct) amount for landlord', `is_active` int(11) NOT NULL COMMENT '0-inactive/1-active', `is_assigned` int(11) NOT NULL DEFAULT '0' COMMENT 'Is this user assigned to any property', `assigned_to` int(11) NOT NULL COMMENT 'assigned to which landlord id', `is_set_up_new_password` enum('0','1') NOT NULL DEFAULT '0' COMMENT 'When admin/landlord adds renter or admin adds landlord an email with id+password is sent to the user. The first time they sign in, it should prompt them to change password page. So this field indicating whether user set up their own password', `created_at` datetime NOT NULL, `modified_at` datetime NOT NULL, `created_by` int(11) NOT NULL, `modified_by` int(11) NOT NULL, `is_deleted` int(11) NOT NULL DEFAULT '0' ) ENGINE=InnoDB DEFAULT CHARSET=latin1; INSERT INTO `users` (`id`, `user_type`, `first_name`, `last_name`, `email_id`, `password`, `company_name`, `street_address`, `city`, `state`, `zip`, `phone_no`, `bank_name`, `billing_address`, `name_on_card`, `bank_account_number`, `bank_routing_no`, `has_payzang_account`, `payzang_api`, `payzang_gateway_url`, `pending_amount`, `is_active`, `is_assigned`, `assigned_to`, `is_set_up_new_password`, `created_at`, `modified_at`, `created_by`, `modified_by`, `is_deleted`) VALUES (1, 'admin', 'Super', 'Admin', 'email@email.com', '$2y$10$NwcR0ObYeUM.OfjgMOaajeH6bjjuMTpU2GnF0h623SER37AFk8K9i', 'UIGJ', 'DJ Road', 'Kolkatr', '20', '522236', '(987) 963-9658', '', '', '', '', '', 1, 'xxxxxxxxxxxxxxxxxxxxx', 'https://payzang.transactiongateway.com/api/v2/three-step', '0.00', 1, 0, 0, '1', '2017-06-02 13:56:34', '2017-06-02 13:56:34', 0, 0, 0), (2, 'landlord', 'Denise', 'Supplee', 'email1@email.com', '$2y$10$PATxnED1/lqr1t7epLJiwu0uVztuXJB97q/XQv7xLc/iCv/m4NK2G', '', '', '', '', '', '2156755615', '', '', '', '', '', 1, 'xxxxxxxxxxxxxxxxxxxxx', 'https://payzang.transactiongateway.com/api/v2/three-step', '230.77', 1, 0, 0, '0', '2017-09-06 16:31:23', '2017-09-10 05:48:45', 0, 1, 0), (3, 'renter', 'Amanda', 'Scott', 'email2@email.com', '$2y$10$oBU3tbcVT63Asw6CDLCGB.SNAvSpJaOcLBx4NqodFNMwvlSRLc.tO', '', '', '', '', '', '(215) 672-3778', '', '', '', '', '', 0, '', '', NULL, 0, 0, 2, '1', '2017-09-06 17:54:02', '2017-09-06 17:54:02', 2, 0, 1), (4, 'renter', 'Megan', 'Ridgell', 'email3@email.com', '$2y$10$mPR7U/ri/rpW2n5f6X4mV.6jCycvLTJ/dvttR8seMFSGgRR/tQh1.', '', '', '', '', '', '2154852222', '', '', '', '', '', 0, '', '', NULL, 0, 0, 11, '1', '2017-09-06 18:30:01', '2017-09-12 06:22:23', 0, 2, 1), (5, 'landlord', 'Brian', 'Davis', 'email4@email.com', '$2y$10$NwcR0ObYeUM.OfjgMOaajeH6bjjuMTpU2GnF0h623SER37AFk8K9i', '', 'PO Box 123', 'Baltimore', '21', '21231', '4104999026', '', '', '', '', '', 0, '', '', '0.00', 1, 0, 0, '0', '2017-09-07 05:19:46', '2017-09-07 05:19:46', 0, 0, 0), (6, 'renter', 'Greg', 'Davis', 'email5@email.com', '$2y$10$Bd6S4KMGec2NzqKcvsJ1huwmbfACu8oOn56JLFSVQFt6ANg0Vr.ZW', '', '', '', '', '', '4104999026', '', '', '', '', '', 0, '', '', NULL, 1, 0, 5, '1', '2017-09-07 05:30:58', '2017-09-17 04:22:30', 0, 2, 0), (7, 'renter', 'Maddie', 'Evans', 'email6@email.com', '$2y$10$ry9DBBFJHRhCsV7wpU9s9OI9WQ8BWqCTdk0K9KyYw/VD10mb/yC3O', '', '', '', '', '', '4104999026', '', '', '', '', '', 0, '', '', NULL, 1, 0, 0, '0', '2017-09-07 06:07:10', '2017-09-17 04:14:48', 0, 5, 0), (8, 'renter', 'Arijita', 'Dey', 'email7@email.com', '$2y$10$NwcR0ObYeUM.OfjgMOaajeH6bjjuMTpU2GnF0h623SER37AFk8K9i', '', '', '', '', '', '3433453535', '', '', '', '', '', 0, '', '', NULL, 0, 0, 2, '0', '2017-09-07 08:08:21', '2017-09-10 05:42:58', 0, 2, 1), (9, 'renter', 'Arijita', 'Dey', 'email8@email.com', '$2y$10$NwcR0ObYeUM.OfjgMOaajeH6bjjuMTpU2GnF0h623SER37AFk8K9i', '', '', '', '', '', '(564) 564-5645', '', '', '', '', '', 0, '', '', NULL, 0, 0, 5, '0', '2017-09-07 11:05:33', '2017-09-07 11:05:33', 5, 0, 1), (10, 'renter', 'Rick', 'Roy', 'email9@email.com', '$2y$10$PG/8SkAwOZsSkD.px9FGveFIbriavOTX6vdBWHHmdjQeKgbwr0/wu', '', '', '', '', '', '(533) 453-4534', '', '', '', '', '', 0, '', '', NULL, 0, 0, 5, '0', '2017-09-07 11:11:49', '2017-09-07 11:11:49', 5, 0, 1), (11, 'landlord', 'Denise', 'Supplee', 'email0@email.com', '$2y$10$52H0QZMFK9ApK7JwaDV9qOlqyOxcyTH3QikQqXgAcLD.zrb.ggoQG', '', '', '', '', '', '2156755615', '', '', '', '', '', 1, 'xxxxxxxxxxxxxxxxxxxxx', 'https://payzang.transactiongateway.com/api/v2/three-step', '0.00', 1, 0, 0, '0', '2017-09-11 14:09:03', '2017-09-18 23:22:57', 0, 1, 0); CREATE TABLE `lease_with_max_id` ( `id` int(11) ,`lease_id` int(11) ,`renter_id` int(11) ,`landlord_id` int(11) ,`property_id` int(11) ,`is_rent_deducted_form_filled_up` enum('0','1') ,`status` int(11) ,`is_deleted` enum('0','1') ,`is_lease_ended` enum('0','1') ); CREATE TABLE `properties` ( `id` int(11) NOT NULL, `user_id` int(11) NOT NULL COMMENT 'landlord''s id', `property_name` varchar(255) NOT NULL, `street_address` varchar(255) NOT NULL, `street_address_2` varchar(255) DEFAULT NULL, `city` varchar(100) NOT NULL, `state` varchar(100) NOT NULL, `zip` varchar(100) NOT NULL, `rent_amount` decimal(18,2) DEFAULT '0.00', `is_active` enum('0','1') NOT NULL, `is_assigned` enum('0','1') NOT NULL DEFAULT '0' COMMENT 'Is assigned to some renter or not', `created_at` datetime NOT NULL, `modified_at` datetime NOT NULL, `is_deleted` int(11) NOT NULL DEFAULT '0' ) ENGINE=InnoDB DEFAULT CHARSET=latin1; INSERT INTO `properties` (`id`, `user_id`, `property_name`, `street_address`, `street_address_2`, `city`, `state`, `zip`, `rent_amount`, `is_active`, `is_assigned`, `created_at`, `modified_at`, `is_deleted`) VALUES (1, 2, 'Test Property', '330 Crooked Billet Rd', '', 'Hatboro', '39', '19040', '0.00', '0', '0', '2017-09-10 05:38:12', '2017-09-10 05:38:12', 0), (2, 2, '', '515 Main St', 'Apt. 1809', 'Horsham', '39', '19044', '0.00', '1', '1', '2017-10-09 04:24:50', '2017-10-09 04:24:50', 1), (3, 5, 'Test 1', '123 Brian\'s Test St.', 'Apt. 2', 'Baltimore', '21', '21231', '0.00', '1', '0', '2017-09-07 05:21:10', '2017-09-07 05:21:10', 1), (4, 5, 'Test 1', '123 Brian\'s Test St.', 'Apt. 2', 'Baltimore', '21', '21231', '0.00', '1', '0', '2017-09-07 05:21:18', '2017-09-07 05:21:18', 1), (5, 5, 'Test 1', '123 Brian\'s Test St.', 'Apt. 1503', 'Baltimore', '21', '21231', '0.00', '0', '0', '2017-09-11 13:49:11', '2017-09-11 13:49:11', 1), (6, 5, 'Test 2', '123 Brian\'s Test St.', 'Apt. 1', 'Baltimore', '21', '21231', '0.00', '1', '1', '2017-10-03 04:31:18', '2017-10-03 04:31:18', 0); CREATE TABLE `geo_states` ( `id` int(11) NOT NULL, `name` varchar(50) NOT NULL DEFAULT '', `abv` char(2) NOT NULL DEFAULT '', `country` char(2) NOT NULL, `is_state` char(1) DEFAULT NULL, `is_lower48` char(1) DEFAULT NULL, `slug` varchar(50) NOT NULL, `latitude` float(9,6) DEFAULT NULL, `longitude` float(9,6) DEFAULT NULL, `population` bigint(20) UNSIGNED DEFAULT NULL, `area` float(8,2) UNSIGNED DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE VIEW `test` AS SELECT `users`.`id` AS `id`, `users`.`user_type` AS `user_type`, `users`.`first_name` AS `first_name`, `users`.`last_name` AS `last_name`, CONCAT( `users`.`first_name`, ' ', `users`.`last_name` ) AS `landlord_name`, `users`.`email_id` AS `email_id`, `users`.`password` AS `password`, `users`.`company_name` AS `company_name`, `users`.`street_address` AS `street_address`, `users`.`city` AS `city`, `users`.`state` AS `state`, `users`.`zip` AS `zip`, `users`.`phone_no` AS `phone_no`, `users`.`bank_name` AS `bank_name`, `users`.`billing_address` AS `billing_address`, `users`.`name_on_card` AS `name_on_card`, `users`.`bank_account_number` AS `bank_account_number`, `users`.`bank_routing_no` AS `bank_routing_no`, `users`.`has_payzang_account` AS `has_payzang_account`, ( CASE WHEN(`users`.`has_payzang_account` = 0) THEN 'No' WHEN(`users`.`has_payzang_account` = 1) THEN 'Yes' END ) AS `has_payzang_account_custom`, `users`.`payzang_api` AS `payzang_api`, `users`.`payzang_gateway_url` AS `payzang_gateway_url`, `users`.`pending_amount` AS `pending_amount`, `users`.`is_active` AS `is_active`, ( CASE WHEN(`users`.`is_active` = '0') THEN 'Inactive' WHEN(`users`.`is_active` = '1') THEN 'Active' END ) AS `is_active_custom`, `users`.`is_assigned` AS `is_assigned`, `users`.`assigned_to` AS `assigned_to`, `users`.`is_set_up_new_password` AS `is_set_up_new_password`, `users`.`created_at` AS `created_at`, `users`.`modified_at` AS `modified_at`, `users`.`created_by` AS `created_by`, `users`.`modified_by` AS `modified_by`, `users`.`is_deleted` AS `is_deleted`, `lease_with_max_id`.`id` AS `lease_with_max_id_id`, `lease_with_max_id`.`lease_id` AS `lease_with_max_id_lease_id`, `lease_with_max_id`.`renter_id` AS `lease_with_max_id_renter_id`, `lease_with_max_id`.`landlord_id` AS `lease_with_max_id_landlord_id`, `lease_with_max_id`.`property_id` AS `lease_with_max_id_property_id`, `lease_with_max_id`.`is_rent_deducted_form_filled_up` AS `lease_with_max_id_is_rent_deducted_form_filled_up`, `lease_with_max_id`.`status` AS `lease_with_max_id_status`, `lease_with_max_id`.`is_deleted` AS `lease_with_max_id_is_deleted`, `properties`.`id` AS `property_id`, `properties`.`user_id` AS `property_user_id`, `properties`.`property_name` AS `property_property_name`, `properties`.`street_address` AS `property_street_address`, `properties`.`street_address_2` AS `property_street_address_2`, `properties`.`city` AS `property_city`, `properties`.`state` AS `property_state`, `properties`.`zip` AS `property_zip`, `properties`.`rent_amount` AS `property_rent_amount`, `properties`.`is_active` AS `property_is_active`, `properties`.`is_assigned` AS `property_is_assigned`, `properties`.`created_at` AS `property_created_at`, `properties`.`modified_at` AS `property_modified_at`, `properties`.`is_deleted` AS `property_is_deleted`, `geo_states`.`name` AS `geo_states_name` FROM ( ( ( `users` LEFT JOIN `lease_with_max_id` ON ( ( `lease_with_max_id`.`renter_id` = `users`.`id` ) ) ) LEFT JOIN `properties` ON ( ( `properties`.`id` = `lease_with_max_id`.`property_id` ) ) ) LEFT JOIN `geo_states` ON ( ( `geo_states`.`id` = `properties`.`state` ) ) ) WHERE ( (`users`.`user_type` = 'landlord') AND(`users`.`is_deleted` = 0) ) ORDER BY `users`.`id` DESC ;

您应该从矩阵的各个部分剪切各种维度的子矩阵,然后计算得分,以了解您的系统整体表现如何。

答案 2 :(得分:0)

由于在真实数据集中我们有许多缺失值(有时甚至是90%),为了获得更好的预测,我们通常用NA替换缺失值。然后我们用用户评级或项目评级的平均值甚至两者的平均值替换NA。如果将缺失值替换为0,则会低估影响结果的数据。 例如,如果我们将矩阵命名为ds:

#compute the user means
user.mean = matrix (rowMeans (ds, na.rm = T), nrow(ds) ,ncol(ds))
# replace the missing values (NA) with the user means
ds[is.na(ds)] = user.mean[is.na(ds)]