Question

您有一组来自某些专有软件的导出Excel文件的数据框。数据看起来像这样：

> head(Ball)
                 Col1   Col2                Col3   Col4                Col5   Col6                Col7   Col8                Col9  Col10               Col11  Col12
1 2014-07-25 00:00:00   <NA> 2014-07-25 00:00:00   <NA> 2014-07-25 00:00:00   <NA> 2014-07-23 00:00:00   <NA> 2014-07-23 00:00:00   <NA> 2014-07-23 00:00:00   <NA>
2 1899-12-31 07:49:00   <NA> 1899-12-31 06:49:00   <NA> 1899-12-31 06:48:00   <NA> 1899-12-31 08:27:00   <NA> 1899-12-31 08:26:00   <NA> 1899-12-31 07:20:00   <NA>
3                   X      Y                   X      Y                   X      Y                   X      Y                   X      Y                   X      Y
4                   0      0                   0      0                   0      0                   0      0                   0      0                   0      0
5        0.0502222222 2.1945        0.0502222222 1.9437                0.05  1.254        0.0501123596 1.6302        0.0501086957      0                0.05      0
6        0.1004444444 5.7684        0.1004444444 4.7652                 0.1 4.2636        0.1002247191 4.2636        0.1002173913 0.3135                 0.1 2.1318
                Col13  Col14
1 2014-07-23 00:00:00   <NA>
2 1899-12-31 07:19:00   <NA>
3                   X      Y
4                   0      0
5        0.0501123596 1.7556
6        0.1002247191  4.389

此数据包含可变数量的可用行，因此许多列具有数据帧中最后一行的NA。我试图将所有这些数据帧（Ball和大约10个其他数据）组合成一个整齐的格式，其中包含每对列的第4行到最后一个非NA行的数据。最终结果如下：

> head(df)
  id name routine trial     volume   flow
1  1 Ball    tech post1 0.00000000 0.0000
2  1 Ball    tech post1 0.05022222 2.1945
3  1 Ball    tech post1 0.10044444 5.7684
4  1 Ball    tech post1 0.15066667 6.8343
5  1 Ball    tech post1 0.20088889 7.2732
6  1 Ball    tech post1 0.25111111 7.5867

其中id是链接到名称的随机标识符，name是导入的数据框的名称，例程根据第一行的日期分配值，trial也根据小时的值分配一个值第二行，volume是从每个X下第4行开始的值，并在每个Y下从第4行开始流动。

这是我提出的功能，x是原始数据框（在本例中为＃34; Ball＆＃34;），y是应添加行的新数据框。

tidier <- function(x, y) {
    for(col in ncol(x) / 2) {
         end.current <- length(x[,col][!is.na(x[,col])])
         length.current <- end.current - 3
         id = rep(1, length.current)
         name = rep("Ball", length.current)
         routine <- rep("tech", length.current)
         trial <- rep("pre2", length.current)
         volume <- as.numeric(Ball[4:end.current, col])
         flow <- as.numeric(Ball[4:end.current, col + 1])
         temp.df <- data.frame(id, name, routine, trial, volume, flow)
         df <- rbind(y, temp.df)
         col <- col + 2
         return(df)
    }
}

我还没有根据原始数据框中的值设置id，name，routine和trial的条件值。运行该函数仅返回原始df数据框，不添加任何行。我没有得到任何错误，也无法弄清楚如何使这项工作。希望这很清楚，我对构建函数很陌生，并且非常感谢能够帮助我完成这项工作。

Answer 1

我认为你使任务复杂化了：

使用带有skip选项的read.table跳过第一行
将extract x列转换为单个向量：volume
将提取y列转换为单个向量：flow
使用回收创建其他列

这里是我的代码：

## here you change text by your file=file_name
d_f <- read.table(text=dat,header=TRUE,skip=3)[,-1]
## extract only X column and expand them in a single vector
volume <- unlist(as.list(d_f[grep('X',colnames(d_f))]))
## extract only Y column and expand them in a single vector
flow <- unlist(as.list(d_f[grep('Y',colnames(d_f))]))
## create you data frame using recycling for other columns
data.frame(id=1,name='Ball',routine='tech',
       trial='pos1',volume=volume,flow=flow)

   id name routine trial     volume   flow
X1    1 Ball    tech  pos1 0.00000000 0.0000
X2    1 Ball    tech  pos1 0.05022222 2.1945
X3    1 Ball    tech  pos1 0.10044444 5.7684
X.11  1 Ball    tech  pos1 0.00000000 0.0000
X.12  1 Ball    tech  pos1 0.05022222 1.9437
X.13  1 Ball    tech  pos1 0.10044444 4.7652
X.21  1 Ball    tech  pos1 0.00000000 0.0000
X.22  1 Ball    tech  pos1 0.05000000 1.2540
X.23  1 Ball    tech  pos1 0.10000000 4.2636

使用函数将许多数据帧中的杂乱数据转换为单个整数格式

1 个答案: