Question

我知道以前曾问过这种问题，但我无法将建议的解决方案应用于我的数据集... 我有一个非常简单的功能，可以将不同的数据收集到更大的数据框中（大约7列和15万行）。我的问题是存储收集的数据。确实，我有一些字符串字符，还有数字和时间。

 Location   Date Creneau Ordre              Name         Qte_conso    Start        End
case 2 18/12/2018       6     1         Caligula Time         0     06:28:35     06:28:35
case 2 18/12/2018       6     2         Lolita Forest       500     07:52:34     08:02:02
case 2 18/12/2018       6     3       Break The Wall       501     08:05:43     08:10:04
case 2 18/12/2018       6     4         Lolita Forest         0     08:10:55     08:11:35
case 2 18/12/2018       6     5          I Know you       501     08:12:43     08:24:26
case 2 18/12/2018       6     6         Caligula Time         0     08:24:39     08:24:39
case 2 18/12/2018       6     7             Aroma         421     08:34:37     08:40:56
case 2 18/12/2018       6     8             Polenta         0     08:41:44     08:41:45
case 2 18/12/2018       6     9             Aroma          79     08:41:49     08:45:43
case 2 18/12/2018       6    10            Polenta       500     08:46:54     08:58:23
case 2 18/12/2018       9     1           I Know you       501     09:03:09     09:11:17
case 2 18/12/2018       9     2             Polenta        500     09:12:03     09:25:34
case 2 18/12/2018       9     3             Decided       500     09:28:15     09:47:34
case 2 18/12/2018       9     4       Lolita Forest       500     09:48:05     09:56:49
case 2 18/12/2018       9     5          Diamond Free       500     09:57:07     10:07:23
time.interval
    0 secs
    568 secs
    261 secs
    40 secs
    703 secs
    0 secs
    379 secs
     1 secs
   234 secs
   689 secs
   488 secs
   811 secs
   1159 secs
   524 secs
   616 secs

每行表示什么时候吃了多少动物。然后，我有几排动物。

我有以下代码来收集我需要的信息（这应该是一种更有效的方法，但是至少可以正常工作，我已经通过将输出打印在几个人身上来进行检查）：

 names <- unique(dataset$Nom)
 dates <- unique(dataset$Date)
 crnx <- unique(dataset$Creneau)

for (name in names){
 for (date in dates) {
   for (crn in crnx) {

   res <- subset(dataset, Nom==name & Date==date & Creneau==crn)
   nbPassage <- nrow(res) 
   qteMax <- max(res$Qte_conso)
   qteMin <- min(res$Qte_conso)
   qteTot <- sum(res$Qte_conso)
   qteMoy <- mean(res$Qte_conso)

   tempsMin <- min(res$interval)
   tempsMax <- max(res$interval)
   tempsTot <- sum(res$interval)
   tempsMoy <- mean(res$interval)
   }
 }
}

我试图将所有这些放入初始化如下的空白数据框中：

df <- data.frame(Nom=character(),
             Date=character(),
             Case=character(),
             Creneau=numeric(),
             Passage=numeric() ,
             Qte_min=numeric(),
             Qte_max=numeric(),
             Qte_tot=numeric(),
             Qte_moy=numeric(),
             Tps_min=character(),
             Tps_max=character(),
             Tps_tot=character(),
             Tps_moy=character(),
             stringsAsFactors=FALSE)

for (name in names){
  for (date in dates) {
    for (crn in crnx) {

    res <- subset(dataset, Nom==name & Date==date & Creneau==crn)
    [...]

    }
  }
df$Nom <- df$Nom + name
   df$Date <- df$Date + date
   df$Creneau <- df$Creneau + crn
   df$Passage <- df$Passage + nbPassage

   df$Qte_min <- df$Qte_min + qteMin
   df$Qte_max <- df$Qte_max + qteMax
   df$Qte_tot <- df$Qte_tot + qteTot
   df$Qte_moy <- df$Qte_moy + qteMoy

   df$Tps_min <- df$Tps_min + tempsMin
   df$Tps_max <- df$Tps_max + tempsMax
   df$Tps_tot <- df$Tps_tot + tempsTot
   df$Tps_moy <- df$Tps_moy + tempsMoy
}

有了这个，我最终得到Error in df$Nom + name : non-numeric argument to binary operator

我也尝试过使用向量（我已经做错了，这不是一个好习惯，但是由于我真的不知道如何进行），知道应该获取多少行，但是我有了{{1 }}包含所有数字一次，包含一个包含字符串的错误信息完全相同。

我也尝试过integer(0)，但是我只有第一个要素。

所有这些的最终目标是能够将新数据帧导出到csv文件中。

在此先感谢那些花时间阅读甚至回答这个问题的人。如果您需要任何其他信息，我们将很乐意为您提供更多信息。

Answer 1

有许多dplyr选项可以执行此操作，但是对于基数R，您可以使用by例如：

by(dataset[, c("Qte_conso", "interval")], dataset[c("name", "date", "crn")], function(x) with(x, data.frame(qteMax=max(Qte_conso), qteMin=min( ....

包含人为数据的工作示例：

df <- data.frame(g1=sample(1:3, 100, replace=T), g2=sample(1:2, 100, replace=T), b=rnorm(100), c=rnorm(100))
foo <- by(df[, c("b", "c")], df[c("g1", "g2")], function(x) 
  c(len = nrow(x), minb=min(x$b), maxb=max(x$b), minc=min(x$c), maxc=max(x$c)))
do.call(rbind, foo)
cbind( expand.grid(attr(foo, "dimnames")), do.call(rbind, foo))

Answer 2

您似乎想要做的只是分组依据操作。您可以使用软件包data.table（以及dplyr）来执行此操作。

假设我有data.frame个动物，animal_names，在每个时间段内都食用某种食物quantity（为简洁起见，我省略了时间变量）：

animal_names <- c(rep("Pierre", 2), rep("Jean", 4))
quantity     <- runif(n = 6, min = 1, max = 10)
df           <- data.frame(names = animal_names, quantity = quantity)

产生：

 > df
   names quantity
1 Pierre 7.620816
2 Pierre 2.754536
3   Jean 2.591135
4   Jean 4.013869
5   Jean 3.865716
6   Jean 7.888450

您可以在操作上进行分组，而不是在unique(names)上循环并计算诸如max(quantity)，mean(quantity)等聚合度量。使用软件包data.table，您可以执行以下操作：

dt <- data.table(df) #to convert the data.frame object into a data.table
summary_df <- dt[, .(
  min_qty = min(quantity),
  max_qty = max(quantity),
  mean_qty = mean(quantity),
  sum_qty = sum(quantity)
), by = names]

产生：

> summary_df
    names  min_qty  max_qty mean_qty  sum_qty
1: Pierre 2.754536 7.620816 5.187676 10.37535
2:   Jean 2.591135 7.888450 4.589792 18.35917

然后导出此数据。表中的函数write.csv。如果您要使用此表但不知道data.table语法，则始终可以使用命令data.frame将对象转换回summary_df <- data.frame(summary_dt)。

如何存储几种不同类型（数字和字符）的结果？

2 个答案: