Question

我在R.工作。我有一个数据框，df看起来像这样：

> str(exp)
'data.frame':   691200 obs. of  19 variables:
 $ groupname: Factor w/ 8 levels "rowA","rowB",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ location : Factor w/ 96 levels "c1","c10","c11",..: 1 2 3 4 12 23 34 45 56 67 ...
 $ starttime: num  0 0 0 0 0 0 0 0 0 0 ...
 $ inadist  : num  0 0.2 0 0.2 0.6 0 0 0 0 0 ...
 $ smldist  : num  0 2.1 0 1.8 1.2 0 0 0 0 3.3 ...
 $ lardist  : num  0 0 0 0 0 0 0 0 0 1.3 ...
 $ fPhase   : Factor w/ 2 levels "Light","Dark": 2 2 2 2 2 2 2 2 2 2 ...
 $ fCycle   : Factor w/ 6 levels "predark","Cycle 1",..: 1 1 1 1 1 1 1 1 1 1 ...

我想添加另一列timepoint，使starttime相对于它所在的fCycle的开头。所以starttime=1801将{ {1}} timepoint=1。

创建fCycle='Cycle 1'的最佳方式是什么？

ETA玩具数据集：

df$timepoint

Answer 1

您可以将rle与sequence合并。这是一些示例代码。输出是你想要的吗？

require(plyr)

mydf = data.frame(
  starttime = 1:20,
  fCycle    = c(rep(1:3, each = 4), rep(4:5, each = 3), rep(6, 2))
)

# sort data in increasing order of cycle and starttime
mydf = arrange(mydf, fCycle, starttime)

mydf = transform(mydf, timepoint = sequence(rle(fCycle)$lengths))

注意：鉴于在同一个fCycle中可能存在相同的开始时间，这是使用rank和ddply的替代方法

# treat same starttimes in an fcycle identically
ddply(mydf, .(fCycle), transform, timepoint = rank(starttime, ties = 'min'))

# treat same starttimes in an fcycle using average
ddply(mydf, .(fCycle), transform, timepoint = rank(starttime, ties = 'average'))

Answer 2

这是一个解决方案的大纲，因为我不清楚你在问什么。看起来你要求从行程长度编码（RLE）派生的东西，它可以通过rle()函数开始。

rle()输出将给出每次运行的长度（指定此lengths）。
可以计算每次运行发生的偏移（通过cumsum(c(1,lengths))）。
这些可以rep（重复）足够的次数（即对于运行中的每个项目）。
对于每个职位（1:n），只需减去运行开始的位置。

编辑：在第3步中无需使用rep。它可以查找长度。

如何将系列重新编号为并行集

2 个答案: