Question

对于示例数据框：

df <- structure(list(area = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                        2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 
                                        4L, 4L, 4L), .Label = c("a1", "a2", "a3", "a4"), class = "factor"), 
                     result = c(0L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 
                                1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L), 
                     weight = c(0.5, 0.8, 1, 3, 3.4, 1.6, 4, 1.6, 2.3, 2.1, 2, 
                                1, 0.1, 6, 2.3, 1.6, 1.4, 1.2, 1.5, 2, 0.6, 0.4, 0.3, 0.6, 
                                1.6, 1.8)), .Names = c("area", "result", "weight"), class = "data.frame", row.names = c(NA, 
                                                                                                                        -26L))

我试图隔离具有最高和最低区域的区域，然后生成加权交叉表，然后用于计算风险差异。

 df.summary <- setDT(df)[,.(.N, freq.1 = sum(result==1), result = weighted.mean((result==1), 
                                                                                   w = weight)*100), by = area]

#Include only regions with highest or lowest percentage
df.summary <- data.table(df.summary)
incl <- df.summary[c(which.min(result), which.max(result)),area]
df.new <- df[df$area %in% incl,]
incl

'incl'有两个我想要的区域，但仍然是四个级别：

[1] a2 a3
Levels: a1 a2 a3 a4

我如何摆脱这些关卡？我想做的后续分析只需要两个层次以及区域。有什么想法吗？

Answer 1

我在网络上的其他地方发现了这一点（例如Problems with levels in a xtab in R）

df.new$area <- factor(df.new$area)

有效！

希望它对其他人有用。

R中xtab中的级别问题

1 个答案: