如何通过NA`s分离包含NA的列?

时间:2015-09-22 16:55:01

标签: r numbers dataframe na

这是我第一次提问,所以请宽容:)

我认为这很简单。我有一个data.frame,它由一列“Time”组成。它看起来像这样:

-------------------------
> head(Times,10)
   Times
1     NA
2  0.448
3  0.130
4     NA
5     NA
6  0.462
7  0.427
8  0.946
9  0.227
10    NA
>
------------------------

这个想法是,第一个NA表示序列的开始,因此,后续时间应来自同一个标签。到达下一个NA条目后,序列结束。

我现在想要创建一个新的data.frame,它将NA`s之间的数字转换为列,并按行分隔序列。

  Time1 Time2 Time3 Time4
1 0.448 0.130 0.123 
2 0.462 0.427 0.946 0.227
>
---------------------------------

你能帮忙吗?

3 个答案:

答案 0 :(得分:5)

Times <- read.table(text = "Times
1     NA
2  0.448
3  0.130
4     NA
5     NA
6  0.462
7  0.427
8  0.946
9  0.227
10    NA", header = TRUE)

#identify values that belong together
Times$ind <- cumsum(is.na(Times$Times)) %/% 2 + 1

Times <- na.omit(Times) #remove NA values

#identify columns
Times$col <- unlist(tapply(Times$ind, factor(Times$ind), seq_along))

#reshape to wide format 
reshape(Times, timevar = "col", idvar = "ind", direction = "wide")
#  ind Times.1 Times.2 Times.3 Times.4
#2   1   0.448   0.130      NA      NA
#6   2   0.462   0.427   0.946   0.227

我使用了基础R来获得乐趣。如果你需要更高效的东西,你应该使用package data.table。

答案 1 :(得分:1)

以下是使用three_months_old_images=`docker images | grep -vi "<none>" | tr -s ' ' | cut -d" " -f3,4,5,6 | grep "3 months ago" | cut -d" " -f1` docker rmi $three_months_old_images dplyr的解决方案:

tidyr

答案 2 :(得分:1)

使用program test implicit none integer, allocatable, target :: a(:,:) integer, pointer :: anew(:,:) integer :: loc(2) allocate( a( 0:4, 2:5 ), source= 10 ) !! make an array filled with 10 a( 2, 3 ) = -700 !! set the minimum value loc(:) = minloc( a ) !! minloc() receives "a" with 1-based indices print *, loc(:) !! so we get [3,2] print *, a( loc(1), loc(2) ) !! 10 (wrong result...) !! Method (1) : adjust indices manually loc(:) = loc(:) + lbound( a ) - 1 print *, a( loc(1), loc(2) ) !! -700 (now a correct result) !! Method (2) : use array pointer with 1-based indices anew( 1:, 1: ) => a loc(:) = minloc( anew ) print *, loc(:) !! we get [3,2] again print *, anew( loc(1), loc(2) ) !! -700 (this time, no need to adjust indices) end program (使用来自@ Roland答案的数据):

data.table v1.9.6

您可以使用require(data.table) # v1.9.6+ setDT(Times)[, `:=`(grp = seq_len(.N), rle = rle), by = .(rle = rleid(is.na(Times)))] dcast(na.omit(Times, by="Times"), rle ~ grp, value.var="Times") # rle 1 2 3 4 # 1: 2 0.448 0.130 NA NA # 2: 4 0.462 0.427 0.946 0.227 获取Q中显示的列名称。