我试图使用两个表(“CS8_2007_2009_M& F.csv”和“CourbeDeCroissance_M& F.csv”)来制定一个有点复杂的条件表达式。
第一个表包含大约60 000个人(nocs8),这些人具有体重(“体重”)和年龄(“agegestation”)的价值
在第二张表中,我有相应的年龄(“GA”)相应的第3,第5和第10百分位数(分别为“3%”,“5%”和“10%”)。
我想根据年龄(“GA”)和相应的体重百分位数(“3%”,“5%”和“10%”)对个体(nocs8)进行编码。
这是我的剧本:
mydata=fread("CS8_2007_2009_M&F.csv",
colClasses = c(rep("character", 5),
rep("numeric", 5 ),
"character",
rep("numeric", 7 ),
rep("character", 9), "numeric"))
setkey(mydata,nocs8)
weight=fread("CourbeDeCroissance_M&F.csv")
setkey(poids, GA)
正常体重
mydata[,quant:=0]
重量< 10%
mydata[, quant:=if(weight[GA==agegestationnel,`10%`]>mydata[[weight]]) 1, by = 1:nrow(mydata)]
重量< 5%
mydata[, quant:=if(weight[GA==agegestationnel,`5%`]>mydata[[weight]]) 1, by = 1:nrow(mydata)]
重量< 3%
mydata[, quant:=if(weight[GA==agegestationnel,`3%`]>mydata[[weight]]) 1, by = 1:nrow(mydata)]
我收到此消息错误:
« Error in weight["GA" == agegestationnel, "10%"] :
incorrect number of dimensions
»
我想知道这是因为我的样本量大(nocs8 = 60 000)还是我问的大量条件(23GA X 3百分位数= 46)?如果是这样,我该怎么办?
答案 0 :(得分:0)
我终于找到了答案:
mydata <- read.csv("file1.csv", sep=";")
weight <- read.csv("file2.csv", sep=";")
data_merge <- merge(mydata, weight, by.x=14, by.y=1, all.x=TRUE)
data_merge$categ = NA
data_merge[!is.na(data_merge$weight) & !is.na(data_merge$X10.) & (data_merge$weight > data_merge$X10.), "categ"] = "Normal"
data_merge[!is.na(data_merge$weight) & !is.na(data_merge$X10.) & (data_merge$weight < data_merge$X10.), "categ"] = "low"
data_merge[!is.na(data_merge$poids) & !is.na(data_merge$X5.) & (data_merge$weight < data_merge$X5.), "categ"] = "very low"
data_merge[!is.na(data_merge$poids) & !is.na(data_merge$X3.) & (data_merge$weight < data_merge$X3.), "categ"] = "Extremely low"