我的df如下
DF:
Name |Code
-------------------+-----
aman |abc
akash |bcd
rudra |cde
Tushar |def
Kartik |efg
aman,akash |fgh
akash,rudra |ghi
akash,rudra,aman |ijk
aman,Tushar |jkl
Kartik,Tushar |klm
rudra,Kartik,akash |lmn
我想搜索以下df的代码
Name |
----------------+
aman,akash,rudra|
Tushar,aman |
Kartik |
rudra,akash |
得到以下结果
Name |code
----------------+-----
aman,akash,rudra|ijk
Tushar,aman |jkl
Kartik |efg
rudra,akash |ghi
请注意“rudra,akash”的组合出现三次,在这种情况下,它返回代码的结果alphabatical顺序
让我知道是否有办法实现这一目标。
答案 0 :(得分:1)
我们可以使用cSplit
中的splitstackshape
来拆分“名称”列并将其重新整形为“长”格式('dfN'),按“代码”分组,并且运行长度为'代码',我们paste
sort
之后的'姓名'。对数据集中的“df2”,match
“名称”列执行相同操作,并获取“dfN”中的“代码”,该代码对应于在“df2”中创建新列“代码”
library(splitstackshape)
dfN <- cSplit(df, "Name", ",", "long")[, .(Name = paste(sort(Name), collapse=",")),
by = .(grp = rleid(Code), Code)]
df2$grp <- seq_len(nrow(df2))
df2$code <- cSplit(df2, "Name", ",", "long")[, .(Name = paste(sort(Name),
collapse=",")), .(grp)][, dfN$Code[match(Name, dfN$Name)]]
df2$grp <- NULL
df2
# Name code
#1 aman,akash,rudra ijk
#2 Tushar,aman jkl
#3 Kartik efg
#4 rudra,akash ghi