Question

我有一个数据框，每行表示一个唯一的 ID。

ID <- 1:12
Date1 <- seq(as.Date("2000/1/1"), length.out = 12, by = "months")
Date2 <- seq(as.Date("2001/1/1"), length.out = 12, by = "months")
Date3 <- seq(as.Date("2002/1/1"), length.out = 12, by = "months")
Fcast1 <- rnorm(12)
Fcast2 <- rnorm(12)
Fcast3 <- rnorm(12)
df <- data.frame(ID, Date1, Fcast1, Date2, Fcast2, Date3, Fcast3)

我想在 Date 和 Fcast 的两列中收集 Date1 到 Date3 和 Fcast1 到 Fcast3 列，并重复 ID 3 次。基本上是创建数据的长期视图或绑定每对 Date 和 Fcast。所需的输出形状：

ID <- rep(ID, 3) 
Date = c(Date1, Date2, Date3)
Fcast = c(Fcast1, Fcast2, Fcast3)
df <- data.frame(ID, Date, Fcast)

Answer 1

您可以执行以下操作：

import numpy as np
from multiprocessing import shared_memory, Pool
import os


def test_function(args): 
    Input, shm_name, size = args
    existing_shm = shared_memory.SharedMemory(name=shm_name)
    d = np.ndarray(size, dtype=np.int32, buffer=existing_shm.buf)
    #print(Input, d[Input-1:Input+2])
    d[Input]=-20
    #print(Input, d[Input-1:Input+2])
    existing_shm.close()
    print(Input, 'parent process:', os.getppid())
    print(Input, 'process id:', os.getpid())


if __name__=='__main__':
    
    shm = shared_memory.SharedMemory(create=True, size=10000000*4)
    b = np.ndarray((10000000,), dtype=np.int32, buffer=shm.buf)
    b[:] = np.random.randint(100, size=10000000, dtype=np.int32)

    inputs =[[    1,shm.name,b.shape],
    [    2,shm.name,b.shape],
    [    3,shm.name,b.shape],
    [    4,shm.name,b.shape],
    [    5,shm.name,b.shape],
    [    6,shm.name,b.shape],
    [    7,shm.name,b.shape],
    [    8,shm.name,b.shape],
    [    9,shm.name,b.shape],
    [    10,shm.name,b.shape],
    [    11,shm.name,b.shape],
    [    12,shm.name,b.shape],
    [13,shm.name,b.shape]]

    with Pool(os.cpu_count()) as p:
        p.map(test_function, inputs)
 
    print(b[:20])
    
    # Clean up from within the first Python shell
    shm.close()
    shm.unlink()  # Free and release the shared memory block at the very end

Answer 2

使用 tidyr::pivot_longer -

tidyr::pivot_longer(df, 
                    cols = -ID, 
                    names_to = '.value', 
                    names_pattern = '(.*)\\d+')

# A tibble: 36 x 3
#      ID Date        Fcast
#   <int> <date>      <dbl>
# 1     1 2000-01-01  0.452
# 2     1 2001-01-01  0.242
# 3     1 2002-01-01 -0.540
# 4     2 2000-02-01  1.54 
# 5     2 2001-02-01  0.178
# 6     2 2002-02-01  0.883
# 7     3 2000-03-01 -0.987
# 8     3 2001-03-01  1.40 
# 9     3 2002-03-01  0.675
#10     4 2000-04-01 -0.632
# … with 26 more rows

收集多个键和值列

2 个答案: