我有一个如下数据表:
library(data.table)
DF <- as.data.table(list(ID = c(125534,"122-343",312343,"12343-343FGV", 1234, 713827), Product = c('Y', NA, NA, 'Z', NA, NA), Type = c(NA, 'D', 'G', NA, NA, NA)))
ID Product Type
1: 125534 Y NA
2: 122-343 NA D
3: 312343 NA G
4: 12343-343FGV Z NA
5: 1234 NA NA
6: 713827 NA NA
我想根据ID的分类方式创建一个名为CATEGORY的新列。 我的错误代码如下所示:
DF$CATEGORY <- ifelse(grepl("^12[0-9]|^31[0-9]|", DF$ID), 'IN', 'OUT')
期望的结果:
ID Product Type CATEGORY
1: 125534 Y NA IN
2: 122-343 NA D OUT
3: 312343 NA G IN
4: 12343-343FGV Z NA OUT
5: 1234 NA NA OUT
6: 713827 NA NA OUT
我想对其进行编码,以便任何带有字母或符号的ID(短于6个字符且不以12或31开头)都出来。其余的都在。
答案 0 :(得分:2)
我认为你的意思是:
DF[, CATEGORY := ifelse(grepl("[^0-9]", ID) |
nchar(ID) < 6 |
!grepl("^12|^31", ID),
"OUT", "IN")]
答案 1 :(得分:1)
我们也可以通过创建&#39; CATEGORY&#39;来做到这一点。列&#34; OUT&#34;值然后指定&#39; i&#39;逻辑索引仅匹配&#34; IN&#34;的条件。案例并指定(:=
)&#39; CATEGORY&#39;到&#34; IN&#34;
DF[, CATEGORY := "OUT"][grepl("^(12|31)[0-9]{4,}$", ID), CATEGORY := "IN"]
DF
# ID Product Type CATEGORY
#1: 125534 Y NA IN
#2: 122-343 NA D OUT
#3: 312343 NA G IN
#4: 12343-343FGV Z NA OUT
#5: 1234 NA NA OUT
#6: 713827 NA NA OUT