我有一个像 -
这样的数据框No. Alphabet
1. A
2. B
3. A
4. A
5. C
6. B
7. C
现在,我想添加一个新的列结果,它会为每个唯一元素提供一个新数字。所以决赛桌将是
No. Alphabet Outcome
1. A 1
2. B 2
3. A 1
4. A 1
5. C 3
6. B 2
7. C 3
如何用R?
实现这一目标答案 0 :(得分:5)
您可以使用as.numeric(factor(.))
,如下所示:
> Letter <- c("A", "A", "B", "C", "B", "A")
> as.numeric(factor(Letter))
[1] 1 1 2 3 2 1
可以使用标准mydf$outcome <- etc
或您喜欢/首选的方法来分配列。
答案 1 :(得分:4)
您也可以
library(data.table)
setDT(df1)[, Outcome:= .GRP, Alphabet][]
# No. Alphabet Outcome
#1: 1 A 1
#2: 2 B 2
#3: 3 A 1
#4: 4 A 1
#5: 5 C 3
#6: 6 B 2
#7: 7 C 3
library(fastmatch)
set.seed(24)
df2 <- data.frame(No = 1:1e7, Alphabet= sample(LETTERS, 1e7,
replace=TRUE), stingsAsFactors=FALSE)
df3 <- copy(df2)
Ananda <- function() {transform(df2,
outcome = as.numeric(factor(df2$Alphabet)))}
Brodie <- function() {transform(df2, outcome=match(Alphabet, Alphabet))}
Brodie2 <- function(){transform(df2, outcome=fmatch(Alphabet, Alphabet))}
akrun <- function() {setDT(df3)[, Outcome:= .GRP, Alphabet][]}
library(microbenchmark)
microbenchmark(Ananda(), Brodie(), Brodie2(), akrun(),
unit='relative', times=20L)
# Unit: relative
# expr min lq mean median uq max neval cld
# Ananda() 4.957064 5.150724 4.427514 4.971581 3.336064 4.622502 20 c
# Brodie() 4.473689 5.074105 4.838985 5.383722 4.641304 4.383919 20 c
#Brodie2() 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 20 a
# akrun() 1.609863 2.047646 1.665557 1.949590 1.331554 1.290921 20 b
system.time(akrun())
# user system elapsed
# 0.197 0.005 0.202
system.time(Brodie2())
# user system elapsed
# 0.081 0.014 0.095
答案 2 :(得分:2)
我们假设您的数据框名为dat
。然后就可以了
dat$Outcome <- as.numeric(as.factor(dat$Alphabet))