重组R数据框

时间:2018-09-04 18:42:33

标签: for-loop dataframe if-statement indexing

我有几十个每天两次的数据,其结构如下

str(Raw.Data)
'data.frame':   709400 obs. of  7 variables:
 $ V1: int  254 1 2 3 9 4 4 4 4 4 ...
 $ V2: Factor w/ 448 levels "0","100","1000",..: 1 40 11 448 286 4 24 23 20 17 ...
 $ V3: Factor w/ 18039 levels "","-1","-10",..: 99 15749 6714 18039 13326 4244 4221 12375 14708 16000 ...
 $ V4: Factor w/ 3509 levels "","-1","-10",..: 3503 3034 3496 1 2176 3496 1219 2878 33 149 ...
 $ V5: Factor w/ 1295 levels "","-1","-10",..: 1092 1273 1019 1 992 1295 1254 40 187 192 ...
 $ V6: int  NA 353 99999 NA 230 99999 163 202 238 262 ...
 $ V7: int  NA 99999 0 NA 40 99999 50 40 70 60 ...

在类似电子表格的格式中,第一天的数据如下:

254 0   1   JUN 1957    NA  NA
1   94823   72520   40.50N  80.22W  353 99999
2   2000    2000    99999   13  99999   0
3   PIT ms          NA  NA
9   9780    353 234 105 230 40
4   10000   157 99999   99999   99999   99999
4   8500    1566    143 64  163 50
4   7000    3168    34  -133    202 40
4   5000    5815    -127    -266    238 70
4   4000    7483    -231    -270    262 60
4   3000    9517    -414    99999   258 150
4   2500    10726   -530    99999   260 170
4   2000    12128   -638    99999   271 230
254 12  1   JUN 1957    NA  NA
1   94823   72520   40.50N  80.22W  353 99999
2   1000    1500    1690    15  7   0
3   PIT ms          NA  NA
9   9770    353 168 113 135 40
4   10000   153 99999   99999   99999   99999
4   8500    1537    119 89  216 80
4   7000    3133    16  4   221 70
4   5000    5779    -132    -182    249 90
4   4000    7444    -240    -314    262 90
4   3000    9469    -414    99999   272 120
4   2500    10682   -511    99999   289 130
4   2000    12097   -608    99999   291 150
4   1500    13868   -630    99999   291 160
4   1000    16400   -611    99999   298 110

我想重新组织数据,以便将第一天的数据减少为:

0   1   JUN 1957    9780    353 234 105 230 40
12  1   JUN 1957    9770    353 168 113 135 40

为此,我需要以2:254开头以“ 254”开头的行的单元格和以2:7开头以“ 9”开头的行的单元格。

我开发了以下代码,但是它甚至没有通过for循环的第一次迭代中的第一个if语句。也许这是数据类型或索引问题?

leng <- dim(Raw.Data)[1]
Processed.Data <- as.data.frame(matrix(0,ncol = 10, nrow = 42000))
i <- 1:leng
count <- 1
for (i in 1:leng){
  if(Raw.for.R[i,1]==254){
    Surface.Obs[count,1:4]<-Raw.for.R[i,2:5]
  } else if(Raw.or.R$V1[i,1]==9){
    Surface.Obs[count,5:10]<-Raw.for.R[i,2:7]
  }
  count <- count +1
}

运行代码时,我收到以下警告消息:

1: In if (Raw.Data[i, 1] == 254) { :
  the condition has length > 1 and only the first element will be used
2: In `[<-.data.frame`(`*tmp*`, count, 1:4, value = list(V2 = c(1L,  :
  replacement element 1 has 709400 rows to replace 1 rows
3: In `[<-.data.frame`(`*tmp*`, count, 1:4, value = list(V2 = c(1L,  :
  replacement element 2 has 709400 rows to replace 1 rows
4: In `[<-.data.frame`(`*tmp*`, count, 1:4, value = list(V2 = c(1L,  :
  replacement element 3 has 709400 rows to replace 1 rows
5: In `[<-.data.frame`(`*tmp*`, count, 1:4, value = list(V2 = c(1L,  :
  replacement element 4 has 709400 rows to replace 1 rows
6: In `[<-.factor`(`*tmp*`, iseq, value = 99L) :
  invalid factor level, NA generated
7: In `[<-.factor`(`*tmp*`, iseq, value = 3503L) :
  invalid factor level, NA generated
8: In `[<-.factor`(`*tmp*`, iseq, value = 1092L) :
  invalid factor level, NA generated

只要能解决我的许多问题之一,我们将不胜感激!

P.S。如果您对如何为缺失的日期插入空白行有一些想法,以后可能会为我省去一个额外的问题。

谢谢!
埃文

0 个答案:

没有答案