Question

我有来自4个电子分销商的绝对频率数据，代表他们为某个功率等级提供的电源数量（以瓦特为单位）。我想将这些数据转换为原始数据，这样我就可以为4个分销商和其他分析创建一个箱线图。我已经尝试过reshape2-lib的R函数melt()，但它将绝对频率视为测量值。

我的数据（绝对频率）如下：

power_in_watt digikey farnell mouser rs
1                   0       0      0  2
2                   0       0      0  1
4                   1       0      1  3
5                   2       0      0  3
6                   2       1      2  3
...

我想要的原始数据：

distributor power_in_watt
rs                      1
rs                      1
rs                      2
digikey                 4
mouser                  4
rs                      4
rs                      4
rs                      4
digikey                 5
digikey                 5
rs                      5
rs                      5
rs                      5
digikey                 6
digikey                 6
farnell                 6
mouser                  6
mouser                  6
rs                      6
rs                      6
rs                      6
rs                      6
...

有没有办法自动转换（最好是R）？

Answer 1

您可以尝试一个班轮基地R：

stack(lapply(df[-1], rep, x=df[,1]))

#   values     ind
#1       4 digikey
#2       5 digikey
#3       5 digikey
#4       6 digikey
#5       6 digikey
#6       6 farnell
#7       4  mouser
#8       6  mouser
#9       6  mouser
#10      1      rs
#11      1      rs
#12      2      rs
#13      4      rs
#14      4      rs
#15      4      rs
#16      5      rs
#17      5      rs
#18      5      rs
#19      6      rs
#20      6      rs
#21      6      rs

数据：

df = structure(list(power_in_watt = c(1L, 2L, 4L, 5L, 6L), digikey = c(0L, 0L, 1L, 2L, 2L), farnell = c(0L, 0L, 0L, 0L, 1L), mouser = c(0L, 0L, 1L, 0L, 2L), rs = c(2L, 1L, 3L, 3L, 3L)), .Names = c("power_in_watt", "digikey", "farnell", "mouser", "rs"), class = "data.frame", row.names = c(NA, -5L))

Answer 2

我认为你想要的是根据给定的频率复制行。

首先，我们将数据以tidyr的长格式放在reshape2的后继库中。

> library(tidyr)
> df.long <- df %>% gather(distributor, count, -power_in_watt)
> df.long
   power_in_watt distributor count
1              1     digikey     0
2              2     digikey     0
3              4     digikey     1
4              5     digikey     2
5              6     digikey     2
6              1     farnell     0
7              2     farnell     0
8              4     farnell     0
9              5     farnell     0
10             6     farnell     1
11             1      mouser     0

然后我们可以使用基数R

基于计数扩展行

> df.long[rep(1:nrow(df.long)), df.long$count), ]
     power_in_watt distributor count
3                4     digikey     1
4                5     digikey     2
4.1              5     digikey     2
5                6     digikey     2
5.1              6     digikey     2
10               6     farnell     1
13               4      mouser     1
15               6      mouser     2
15.1             6      mouser     2
16               1          rs     2
16.1             1          rs     2
17               2          rs     1

编辑：仔细阅读会让我改进我的答案

将绝对频率数据转换为原始数据

2 个答案: