我一直在大量使用tidyverse,但是对于某些项目,我需要data.table的速度。到目前为止,我了解大多数DT语法,但是我想在不使用mutate_if
的情况下将data.table中未使用的级别删除。
有了dplyr
,我可以使用mutate_if(dataframe, is.factor, droplevels)
就是这样。但是,我找不到关于data.table的方法。
我尝试使用dataframe[, (.SD) := droplevels(.SD), .SDcols = sapply(dataframe, is.factor)]
申请this answer
它引发以下错误:Error in
[。data.table (DT_, ,
:= ((.SD), droplevels(.SD)), .SDcols = sapply(DT_, :
LHS of := isn't column names ('character') or positions ('integer' or 'numeric')
。
我希望不使用tidyverse就能得到与mutate_if
中相同的结果。
更新
我接受了G. Grothendieck's的答案,因为代码更像我期望的那样。
他使用的示例是这样:
library(data.table)
DT <- data.table(a = 1:5,
b = factor(1:5, levels = 1:10),
c = factor(6:10, levels = 1:10))
我在此示例中使用的数据如下:
set.seed(42)
DT1 = data.table(
A = LETTERS[1:10],
B = c(1:10),
C = factor(sample(LETTERS, 10), levels = LETTERS),
D = factor(sample(LETTERS, 10), levels = LETTERS)
)
感兴趣的列是:
> DT1[, C]
[1] Q E A J D R Z O G V
Levels: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
> DT1[, D]
[1] Y E N T R O C I D Z
Levels: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
结果是
# with base
DT1 = droplevels(DT1)
# or by reference
DT1[, (names(DT1)) := droplevels(.SD)]
具有以下输出:
> DT1[, C]
[1] Q E A J D R Z O G V
Levels: A D E G J O Q R V Z
> DT1[, D]
[1] Y E N T R O C I D Z
Levels: C D E I N O R T Y Z
感谢大家的回答,很快!
答案 0 :(得分:5)
使用末尾注释中的数据
DT[, (names(DT)) := droplevels(.SD)]
或
DT <- droplevels(DT)
检查:
levels(DT$b)
## [1] "1" "2" "3" "4" "5"
levels(DT$c)
## [1] "6" "7" "8" "9" "10"
如果问题中的droplevels
仅作为示例,而您使用的实函数没有data.frame方法,则使用与此对应的代码:
wx <- which(sapply(DT, is.factor))
DT[, (wx) := lapply(.SD, droplevels), .SDcols = wx]
library(data.table)
DT <- data.table(a = 1:5,
b = factor(1:5, levels = 1:10),
c = factor(6:10, levels = 1:10))
简体。
答案 1 :(得分:4)
这不是data.table
解决方案,但是可以使用基数R的rapply
来完成:
## data
data("iris")
## add dummy level
levels(iris$Species) <- c(levels(iris$Species), "dummy")
str(iris)
#> 'data.frame': 150 obs. of 5 variables:
#> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#> $ Species : Factor w/ 4 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
iris2 <- rapply(iris, f = droplevels, classes = "factor", how = "replace")
str(iris2)
#> 'data.frame': 150 obs. of 5 variables:
#> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
答案 2 :(得分:3)
另一个使用set()
输入数据
library(data.table)
DT <- as.data.table(iris)
DT[, Species := as.factor(Species)]
DT <- DT[Species == "setosa"]
DT[, levels(Species)]
#[1] "setosa" "versicolor" "virginica"
获取构成因素的列名并替换为引用
cols <- DT[, names(Filter(is.factor, .SD))]
for(j in cols) {
set(DT, j = j, value = droplevels(DT[[j]]))
}
# could also be written as a one-liner - thanks to @MattSummersgill
# for(j in cols) set(DT, j = j, value = droplevels(DT[[j]]))
给予
DT[, levels(Species)]
#[1] "setosa"
答案 3 :(得分:2)
要添加到我的评论中,
您可以尝试table.express
,
尽管应该更新示例,因为它们可以简化。
这是一个等效于mutate_if
的示例:
library(data.table)
library(table.express)
data("iris")
DT <- as.data.table(iris)
DT %>%
start_expr %>%
mutate(Species = as.factor(Species)) %>%
mutate_sd(is.factor(.COL), droplevels) %>%
end_expr
但是请检查整个小插图, 一些动词渴望而有些懒惰。
答案 4 :(得分:1)
怎么样?
#include <cmath>
#include <cstdio>
/* Decode the IEEE-754 binary16 encoding into a floating-point value.
Details of NaNs are not handled.
*/
static float InterpretAsBinary16(unsigned Bits)
{
// Extract the fields from the binary16 encoding.
unsigned SignCode = Bits >> 15;
unsigned ExponentCode = Bits >> 10 & 0x1f;
unsigned SignificandCode = Bits & 0x3ff;
// Interpret the sign bit.
float Sign = SignCode ? -1 : +1;
// Partition into cases based on exponent code.
float Significand, Exponent;
// An exponent code of all ones denotes infinity or a NaN.
if (ExponentCode == 0x1f)
return Sign * (SignificandCode == 0 ? INFINITY : NAN);
// An exponent code of all zeros denotes zero or a subnormal.
else if (ExponentCode == 0)
{
/* Subnormal significands have a leading zero, and the exponent is the
same as if the exponent code were 1.
*/
Significand = 0 + SignificandCode * 0x1p-10;
Exponent = 1 - 0xf;
}
// Other exponent codes denote normal numbers.
else
{
/* Normal significands have a leading one, and the exponent is biased
by 0xf.
*/
Significand = 1 + SignificandCode * 0x1p-10;
Exponent = ExponentCode - 0xf;
}
// Combine the sign, significand, and exponent, and return the result.
return Sign * std::ldexp(Significand, Exponent);
}
int main(void)
{
unsigned Bits = 0x7bff;
std::printf(
"Interpreting the bits 0x%x as an IEEE-754 binary16 yields %.99g.\n",
Bits,
InterpretAsBinary16(Bits));
}