R:从分析数据中提取数据

时间:2015-01-12 20:01:24

标签: r dataframe

我正在尝试从数据框中提取数据以进行分析。

heightweight <- function(person, health) {
    ## Read in data
    data <- read.csv("heightweight.csv", header = TRUE,
                     colClasses = "character")
    ## Check that the outcomes are valid
    measure = c("height", "weight")
    if(health %in% measure == FALSE){
        stop("Valid inputs are height and weight")
    }
    ## Truncate the data matrix to only what columns are needed
    data <- data[c(1, 5, 7)]
    ## Rename columns
    names(data)[1] <- "Name"
    names(data)[2] <- "Height"
    names(data)[3] <- "Weight"
    ## Convert numeric columns to numeric
    data[, 2] <- as.numeric(data[, 3])
    data[, 3] <- as.numeric(data[, 4])
    ## Convert NAs to 0 after coercion
    data[is.na(data)] <- 0
    ## Check that the name is valid
    name <- data[, 1]
    name <- unique(name)
    if(person %in% name == FALSE){
        stop("Invalid person")
    }
    ## Return person with lowest height or weight
    list <- data[data$name == person & data[health],]
    outcomes <- list[, health]
    minumum <- which.min(outcomes)
    ## Min Rate
    minimum[rowNum, ]$name
}

我遇到的问题是

list <- data[data$name == person & data[health],]

也就是说,我运行heightweight("Bob", "weight"),我收到以下消息

Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), nrow = nr,  : 
  length of 'dimnames' [2] not equal to array extent

我已经用Google搜索了这条消息,并在此检查了一些帖子,但无法确定问题所在。

2 个答案:

答案 0 :(得分:3)

除非我遗漏了某些内容,否则如果您只需要给定名称的最低权重或高度,则最后三行代码有点多余。

这是获得特定人员最低健康衡量标准的简单方法:

min(data[data$name==person, "height"])

第一部分仅选择与该人对应的数据行,它充当行索引。逗号后面的第二部分仅选择所需的变量(列)。选择所需数据后,您将在该数据子集中查找最小值。

举例说明结果:

data<-data.frame(name=as.character(c(rep("carlos",2),rep("marta",3),rep("johny",2),"sara")))
set.seed(1)
data$height <- rnorm(8,68,3)
data$weight <- rnorm(8,160,10)

相应的数据框:

   name   height   weight
1 carlos 66.12064 165.7578
2 carlos 68.55093 156.9461
3  marta 65.49311 175.1178
4  marta 72.78584 163.8984
5  marta 68.98852 153.7876
6  johny 65.53859 137.8530
7  johny 69.46229 171.2493
8   sara 70.21497 159.5507

让我们说我们想要玛塔的最小重量:

person <- "marta"
health <- "weight"

最小&#34;重量&#34;对于&#34; marta&#34;是,

min(data[data$name==person,health])

给出了期望的结果:

[1] 153.7876

答案 1 :(得分:0)

以下是您的功能的简化模拟:

heightweight <- function(person,health) {
  data.set <- data.frame(names=rep(letters[1:5],each=3),height=171:185,weight=seq(95,81,by=-1))
  d1 <- data.set[data.set$name == person,]
  d2 <- d1[d1[,health]==min(d1[,health]),]
  d2[,c('names',health)]    
}

第一行生成样本数据集。第二行选择给定person的所有记录。最后一行找到与health的最小值对应的记录。

heightweight('b','height')
#   names height
# 4     b    174