我是R的新手,无法搜索我遇到的具体问题的答案。
如果我的数据框如下所示:
d <- data.frame(Name = c("Jon", "Jon", "Jon", "Kel", "Kel", "Kel", "Don", "Don", "Don"),
No1 = c(1,2,3,1,1,1,3,3,3),
No2 = c(1,1,1,2,2,2,3,3,3))
Name No1 No2
Jon 1 1
Jon 2 1
Jon 3 1
Kel 1 2
Kel 1 2
Kel 1 2
Don 3 3
Don 3 3
Don 3 3
...
我如何添加能够向数据框添加新列,其中列将指示列No1
和No2
中的唯一值:这将是(1,2,3), (1,2),(3)分别为John,Kelly,Don,
因此,如果新列名为ID#
,则所需结果应为
d2 <- data.frame(Name = c("Jon", "Jon", "Jon", "Kel", "Kel", "Kel", "Don", "Don", "Don"),
No1 = c(1,2,3,1,1,1,3,3,3),
No2 = c(1,1,1,2,2,2,3,3,3),
ID1 = c(1,1,1,1,1,1,3,3,3),
ID2 = c(2,2,2,2,2,2,NA,NA,NA),
ID3 = c(3,3,3,NA,NA,NA,NA,NA,NA))
Name No1 No2 ID1 ID2 ID3
Jon 1 1 1 2 3
Jon 2 1 1 2 3
Jon 3 1 1 2 3
Kel 1 2 1 2 NA
Kel 1 2 1 2 NA
Kel 1 2 1 2 NA
Don 3 3 3 NA NA
Don 3 3 3 NA NA
Don 3 3 3 NA NA
答案 0 :(得分:4)
declare var __moduleName: any;
@Component({
moduleId: __moduleName,
selector: 'dashboard',
templateUrl: 'dashboard.html',
styleUrls: ['dashboard.css']
})
答案 1 :(得分:3)
整齐的方法:
library(dplyr)
library(tidyr)
# evaluate separately for each name
d %>% group_by(Name) %>%
# add a column of the unique values pasted together into a string
mutate(ID = paste(unique(c(No1, No2)), collapse = ' ')) %>%
# separate the string into individual columns, filling with NA and converting to numbers
separate(ID, into = paste0('ID', 1:3), fill = 'right', convert = TRUE)
## Source: local data frame [9 x 6]
## Groups: Name [3]
##
## Name No1 No2 ID1 ID2 ID3
## * <fctr> <dbl> <dbl> <int> <int> <int>
## 1 Jon 1 1 1 2 3
## 2 Jon 2 1 1 2 3
## 3 Jon 3 1 1 2 3
## 4 Kel 1 2 1 2 NA
## 5 Kel 1 2 1 2 NA
## 6 Kel 1 2 1 2 NA
## 7 Don 3 3 3 NA NA
## 8 Don 3 3 3 NA NA
## 9 Don 3 3 3 NA NA
这是一个很好的基础版本,采用基本的split-apply-combine方法:
# store distinct values in No1 and No2
cols <- unique(unlist(d[,-1]))
# split No1 and No2 by Name,
ids <- data.frame(t(sapply(split(d[,-1], d$Name),
# find unique values for each split,
function(x){y <- unique(unlist(x))
# pad with NAs,
c(y, rep(NA, length(cols) - length(y)))
# and return a data.frame
})))
# fix column names
names(ids) <- paste0('ID', cols)
# turn rownames into column
ids$Name <- rownames(ids)
# join two data.frames on Name columns
merge(d, ids, sort = FALSE)
## Name No1 No2 ID1 ID2 ID3
## 1 Jon 1 1 1 2 3
## 2 Jon 2 1 1 2 3
## 3 Jon 3 1 1 2 3
## 4 Kel 1 2 1 2 NA
## 5 Kel 1 2 1 2 NA
## 6 Kel 1 2 1 2 NA
## 7 Don 3 3 3 NA NA
## 8 Don 3 3 3 NA NA
## 9 Don 3 3 3 NA NA
只是为了踢,这里是一个创造性的备用基础版本,它利用table
而不是分割/分组:
# copy d so as not to distort original with factor columns
d_f <- d
# make No* columns factors to ensure similar table structure
d_f[, -1] <- lapply(d[,-1], factor, levels = unique(unlist(d[, -1])))
# make tables of cols, sum to aggregate occurrences, and set as boolean mask for > 0
tab <- Reduce(`+`, lapply(d_f[, -1], table, d_f$Name)) > 0
# replace all TRUE values with values they tabulated
tab <- tab * matrix(as.integer(rownames(tab)), nrow = nrow(tab), ncol = ncol(tab))
# replace 0s with NAs
tab[tab == 0] <- NA
# store column names
cols <- paste0('ID', rownames(tab))
# sort each row, keeping NAs
tab <- data.frame(t(apply(tab, 2, sort, na.last = T)))
# apply stored column names
names(tab) <- cols
# turn rownames into column
tab$Name <- rownames(tab)
# join two data.frames on Name columns
merge(d, tab, sort = FALSE)
结果完全相同。
答案 2 :(得分:2)
我们可以使用单个外部包,即data.table
并获取输出。转换&#39; data.frame&#39;到&#39; data.table&#39; (setDT(d)
),按姓名&#39;分组,我们unlist
.SDcols
中提到的列,获取唯一值,{&#39}}来自&# 39;长&#39;广泛的&#39;格式,与原始数据集dcast
进行联接&#34;名称&#34;列。
on
或者这可以通过library(data.table)
dcast(setDT(d)[, unique(unlist(.SD)) , Name, .SDcols = No1:No2],
Name~paste0("ID", rowid(Name)), value.var="V1")[d, on = "Name"]
# Name ID1 ID2 ID3 No1 No2
#1: Jon 1 2 3 1 1
#2: Jon 1 2 3 2 1
#3: Jon 1 2 3 3 1
#4: Kel 1 2 NA 1 2
#5: Kel 1 2 NA 1 2
#6: Kel 1 2 NA 1 2
#7: Don 3 NA NA 3 3
#8: Don 3 NA NA 3 3
#9: Don 3 NA NA 3 3
{1}}元素中的paste
元素一行完成。和&#39; No2&#39;,按名称&#39;分组,然后使用unique
中的split
将cSplit
分为三列。
splitstackshape
或仅使用library(splitstackshape)
cSplit(setDT(d)[, ID:= paste(unique(c(No1, No2)), collapse=" ") , Name], "ID", " ")
# Name No1 No2 ID_1 ID_2 ID_3
#1: Jon 1 1 1 2 3
#2: Jon 2 1 1 2 3
#3: Jon 3 1 1 2 3
#4: Kel 1 2 1 2 NA
#5: Kel 1 2 1 2 NA
#6: Kel 1 2 1 2 NA
#7: Don 3 3 3 NA NA
#8: Don 3 3 3 NA NA
#9: Don 3 3 3 NA NA
来踢球
baseVerse
注意:没有使用任何包,也没有太多精力进行拆分。