我已将数据集上传到R.数据集有2列
user_id merchant_id
514729 14852,16695
1240327 23590
7457 211
359027 2483
463149 5802
514730 5460,1896
41953 7183,147105
927805 304,3909,4151,32,3,39171
正如您所看到的,某些用户ID与多个商家相关联。我要做的是以这样的方式转换数据 我有以下架构
User Id MerchantId1 MerchantId2 MerchantId3 MerchantId 4
123445 0 1 0 1
123453 1 0 0 0
基本上我想根据user_id是否有merchant_id来创建一个基于user或id的用户ID和商家ID矩阵。
关于如何实现这一目标的任何建议/帮助?
我希望用它来构建推荐系统。任何帮助都会很棒。
答案 0 :(得分:1)
我的解释是你正在追求以下内容:
library(splitstackshape)
cSplit_e(mydf, "merchant_id", ",", type = "character", fill = 0)
## user_id merchant_id merchant_id_147105 merchant_id_14852 merchant_id_16695
## 1 514729 14852,16695 0 1 1
## 2 1240327 23590 0 0 0
## 3 7457 211 0 0 0
## 4 359027 2483 0 0 0
## 5 463149 5802 0 0 0
## 6 514730 5460,1896 0 0 0
## 7 41953 7183,147105 1 0 0
## merchant_id_1896 merchant_id_211 merchant_id_23590 merchant_id_2483
## 1 0 0 0 0
## 2 0 0 1 0
## 3 0 1 0 0
## 4 0 0 0 1
## 5 0 0 0 0
## 6 1 0 0 0
## 7 0 0 0 0
## merchant_id_5460 merchant_id_5802 merchant_id_7183
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 0 0
## 5 0 1 0
## 6 1 0 0
## 7 0 0 1
答案 1 :(得分:1)
您还可以使用 if ( isset($_POST['submit'])) {
selected_val = $_POST['usertype']; // Storing Selected Value In Variable
$e_mail = filter_var($_POST['email'], FILTER_VALIDATE_EMAIL);
$e_mail = $e_mail . " - is a - " .$selected_val . " ," . "\n";
file_put_contents('email-list.txt', $e_mail, FILE_APPEND | LOCK_EX);
}
mtabulate
qdapTools
library(qdapTools)
cbind(df1, mtabulate(strsplit(df1$merchant_id, ',')))
# user_id merchant_id 147105 14852 16695 1896 211 23590 2483 3 304
#1 514729 14852,16695 0 1 1 0 0 0 0 0 0
#2 1240327 23590 0 0 0 0 0 1 0 0 0
#3 7457 211 0 0 0 0 1 0 0 0 0
#4 359027 2483 0 0 0 0 0 0 1 0 0
#5 463149 5802 0 0 0 0 0 0 0 0 0
#6 514730 5460,1896 0 0 0 1 0 0 0 0 0
#7 41953 7183,147105 1 0 0 0 0 0 0 0 0
#8 927805 304,3909,4151,32,3,39171 0 0 0 0 0 0 0 1 1
# 32 3909 39171 4151 5460 5802 7183
#1 0 0 0 0 0 0 0
#2 0 0 0 0 0 0 0
#3 0 0 0 0 0 0 0
#4 0 0 0 0 0 0 0
#5 0 0 0 0 0 1 0
#6 0 0 0 0 1 0 0
#7 0 0 0 0 0 0 1
#8 1 1 1 1 0 0 0