如何解决“无效因素水平”?

时间:2019-05-17 08:35:52

标签: r dataframe factors levels

我无法运行均值函数。这是我的代码:

我已经成功尝试了factor(data $ date)函数。外壳程序回答说它由890个51级条目组成。

   data <- read.table("R/DATA.csv", sep = ";", header = TRUE, dec = ",")
   View(data)
   colnames(data)[1] <- "Date"
   eau <- data$"Tension"
   eaucalculee <- ( 0.000616 * eau - 0.1671) * 100
   data["Eau"] <- eaucalculee
     tata <- data.frame("Aucun","Augmentation","Interception")

   tata[1,1]<-mean(data$Eau[data$Date == levels(factor(data$Date))[1]& 
   data$Traitement == "Aucun"])

我希望在 tata 数据帧的第一列第一行中填充均值,但实际上我收到了此错误消息:

   In `[<-.factor`(`*tmp*`, iseq, value = 8.6692) :
   invalid factor level, NA generated 

能帮我吗?

您可能会在其中找到csv文件:https://drive.google.com/file/d/1zbA25vajouQ4MiUF72hbeV8qP9wlMqB9/view?usp=sharing

非常感谢您

2 个答案:

答案 0 :(得分:0)

tata是一个因子data.frame,您想在其中插入一个数字 尝试

tata <- data.frame("Aucun","Augmentation","Interception" ,stringsAsFactors = F)

答案 1 :(得分:0)

我不确定tata <- data.frame("Aucun","Augmentation","Interception")行是否符合您的期望。如果使用View(tata)检查其结果,您将看到一个数据框,其中包含一条记录和3列,其是您的3个字符串(转换为因数,如@ s-brunel所述)。列名是根据其值(X.Aucun.等)推断出来的。我猜您是想创建一个数据框,其列名是给定的字符串。

建议的代码,带注释

data <- read.table("R/DATA.csv", sep = ";", header = TRUE, dec = ",")

# The following is useless since first column is already named Date
# colnames(data)[1] <- "Date"

# No need to create your intermediate variables eau and eaucalculee: you can 
# do it directly with the data frame columns
data$Eau <- ( 0.000616 * data$Tension - 0.1671) * 100

# No need to create your tata data frame before filling its actual content, you
# can do it directly
tata <- data.frame(
  Aucun = mean(data$Eau[
    data$Date == levels(factor(data$Date))[1] & data$Traitement == "Aucun"
    ])
  )
tata$Augmentation = your_formula_here
tata$Interception = your_formula_here

注释1 :引用数据框列的最简单方法是使用$,并且不需要使用任何双引号。您还可以将[[与双引号(等效)一起使用,但要注意[,它会返回带有单列的数据帧:

class(data$Date)
# [1] "factor"
class(data[["Date"]])
# [1] "factor"
class(data["Date"])
# [1] "data.frame"
class(data[ , "Date"])
# [1] "factor"

注释2 :尝试对您提出的问题进行反向工程,也许您想为Date和Traitement的每种组合计算Eau的平均值。在这种情况下,我建议您使用dplyr令人敬畏的一组软件包中的tidyrtidyverse

# install.packages("tidyverse") # if you don't already have it
library(tidyverse)

data <- data %>% 
  mutate(Eau = ( 0.000616 * data$Tension - 0.1671) * 100)

tata_vertical <- data %>% 
  group_by(Date, Traitement) %>% 
  summarise(mean_eau = mean(eau))
View(tata_vertical)

tata <- tata_vertical %>% spread(Traitement, mean_eau)
View(tata)

关于https://www.tidyverse.org/learn/的很多文档