我有一个大的数据集,大约1 mil行和8个cols(变量)。其中一个变量ORDER的类别从1到90.我想创建一个新的data.frame,变量ORDER(4)1,2,3 +和ALL的类别数量减少,其中ALL是总和所有类别(1-90)和3+的频率是类别的频率之和> = 3(所以3到90)。
YEAR PROVINCE ZONA91OK AGE5 ORDER NATIONALITY_MOTHER NATIONALITY_FATHER FREQUENCY
1979 1 101 15 1 No computable No computable 10
1989 3 102 20 1 No computable No computable 50
我对R的数据管理非常新,所以非常感谢任何有关此问题的帮助!
以下是data.frame
的示例mydata<-structure(list(YEAR = c(1981, 1981, 1981, 1981, 1981, 1981, 1981,
1981, 1981, 1981, 1981, 1981, 1981, 1981, 1981, 1981, 1981, 1981,
1981, 1981, 1981), PROVINCE = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), ZONA91OK = c(101, 101, 101,
101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101,
101, 101, 101, 101, 101), AGE5 = c(15, 20, 20, 25, 25, 25, 25,
30, 30, 30, 30, 30, 35, 35, 35, 35, 35, 35, 40, 40, 40), ORDER = c(1,
1, 2, 1, 2, 3, 4, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 12, 1, 3, 5),
NATIONALITY_MOTHER = structure(c(9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L
), .Label = c("España", "UE-15 y PD", "Resto Europa", "Magreb",
"África Sub-sahariana", "Latinoamérica", "Asia", "Resto del Mundo",
"No computable"), class = "factor"), NATIONALITY_FATHER = structure(c(9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L), .Label = c("España", "UE-15 y PD", "Resto Europa",
"Magreb", "África Sub-sahariana", "Latinoamérica", "Asia",
"Resto del Mundo", "No computable"), class = "factor"), FREQUENCY = c(10,
40, 20, 50, 30, 10, 1, 10, 15, 10, 1, 1, 5, 5, 5, 1, 1, 1,
1, 1, 1)), .Names = c("YEAR", "PROVINCE", "ZONA91OK", "AGE5",
"ORDER", "NATIONALITY_MOTHER", "NATIONALITY_FATHER", "FREQUENCY"
), row.names = 60175:60195, class = "data.frame")
答案 0 :(得分:0)
如果您的数据有1M行,您可能会想要使用data.table
library(data.table)
myDT <- data.table(mydata, key="ORDER")
specialCats <- c(1, 2, 3)
rbind(
myDT[, list(SUM_FOR="ALL", FREQ_SUM=sum(FREQUENCY))]
, myDT[!.(specialCats), list(SUM_FOR="3+", FREQ_SUM=sum(FREQUENCY))]
)
## RESULTS:
SUM_FOR FREQ_SUM
1: ALL 219
2: 3+ 7
要将ORDER
列更改为您的要求,请使用:
myDT[, order := ifelse(ORDER %in% specialCats, as.character(ORDER), "3+")]
注1:为了使3+
成为值,您需要转换为字符串
注意2:为"ALL"
添加一行没有多大意义,因为您要为AGE
,PROVINCE
等添加一行?