Question

假设您想要构造一个pd.DataFrame，并且希望每次增加其中的复制数时都能得到不同的数字。（请向下滚动查看R中的可重复示例）

我想用Python获得相同的输出，但我不知道如何到达那里！

如果你考虑这个简单的pd.Dataframe

df = pd.DataFrame({ 
        'a':[np.random.normal(0.27,0.01,5),np.random.normal(1,0.01,5)]})

df      
                                                          a
        0  [0.268297564096, 0.252974100195, 0.27613413347...
        1  [0.996267313891, 1.00497494738, 1.022271644, 1...

我不知道为什么数据看起来像这样。当我只做一个np.random.normal我得到了这个，

        a
0  0.092309
1  0.085985
2  0.083635
3  0.081582
4  0.104096

抱歉，我无法解释这种行为。我是pandas的新人，也许你可以解释一下。

好的，让我们回到原来的问题;

如果你想生成第二组数字，我想我应该使用np.repeat

df = pd.DataFrame({['a':np.repeat(np.random.normal(0.10,0.01,5),np.random.normal(0.10,0.01,5)])})


df
Out[59]: 
           a
0   0.090305
1   0.090305
2   0.109092
3   0.109092
4   0.101706
5   0.101706
6   0.087357
7   0.087357
8   0.099094
9   0.099094
10  0.101595
11  0.101595
12  0.100343
13  0.100343
14  0.085380
15  0.085380
16  0.102118
17  0.102118
18  0.107328
19  0.107328

但是np.repeat只生成两次相同的数字并不是我想要的输出。

这是R案例中的方法，

df <- data.frame(y = do.call(c,replicate(n = 2,
                                    expr = c(rnorm(5,0.10,0.01),rnorm(5,1,0.01)),
                                    simplify = FALSE)),gr = rep(seq(1,2),each=10))



         y     gr
1  0.11300203  1
2  0.11840556  1
3  0.09420799  1
4  0.10480623  1
5  0.08561427  1
6  1.00076001  1
7  1.00035891  1
8  1.00936751  1
9  1.00050563  1
10 1.00564799  1
11 0.09415217  2
12 0.10794155  2
13 0.11534605  2
14 0.08806740  2
15 0.12394189  2
16 0.99330066  2
17 0.98254134  2
18 0.99828079  2
19 1.00786526  2
20 0.97864180  2

基本上在R中你可以非常直接地做到这一点。但我想在python中必须为它编写一个函数。

在R中，您可以使用rnorm和numpy生成正常的数字分布，我们可以使用np.random.normal执行此操作。但我找不到任何内置函数，尤其是do.call。

Answer 1

不确定这是否是您想要的，但您可以使用for循环并生成第二组随机数，如下所示。

df = pd.DataFrame.from_items([('a' , np.append([np.random.normal(0.10,0.01,5) for _ in xrange(2)], 
                                             [np.random.normal(1,0.01,5) for _ in xrange(2)]
                                            ))])

然后

df

           a
0   0.105469
1   0.091046
2   0.091626
3   0.104579
4   0.110971
5   0.076754
6   0.104674
7   0.096062
8   0.103571
9   0.089955
10  0.978489
11  0.997081
12  1.009864
13  1.000333
14  0.998483
15  1.010685
16  1.004473
17  1.001833
18  1.007723
19  0.999845

Answer 2

实际上，在R中你不需要do.call()：

set.seed(95)
df <- data.frame(y = c(rnorm(10,0.10,0.01), rnorm(10,1,0.01)),
                 gr = c(rep(0,10), rep(1,10)))
df
#             y gr
# 1  0.08970880  1
# 2  0.08384474  1
# 3  0.09972121  1
# 4  0.09678872  1
# 5  0.11880371  1
# 6  0.10696807  1
# 7  0.09135123  1
# 8  0.08925115  1
# 9  0.10994412  1
# 10 0.09769954  1
# 11 1.01486420  2
# 12 1.01533145  2
# 13 1.01454184  2
# 14 0.99125878  2
# 15 0.98222886  2
# 16 1.00128867  2
# 17 0.97588819  2
# 18 0.98216944  2
# 19 0.99982671  2
# 20 0.99090591  2

使用Python pandas / numpy，考虑使用np.concatenate

连接数组

import pandas as pd
import numpy as np

np.random.seed(89)
df = pd.DataFrame({'y': np.concatenate([np.random.normal(0.1,0.01,10), 
                                        np.random.normal(1,0.01,10)]),
                   'gr': [1]*10 + [2]*10})
print(df)    
#     gr         y
# 0    1  0.083063
# 1    1  0.099979
# 2    1  0.095741
# 3    1  0.097444
# 4    1  0.096942
# 5    1  0.100405
# 6    1  0.099316
# 7    1  0.087978
# 8    1  0.098175
# 9    1  0.091204
# 10   2  0.997568
# 11   2  1.006740
# 12   2  1.003449
# 13   2  0.993747
# 14   2  0.997935
# 15   2  0.991284
# 16   2  0.991299
# 17   2  1.003981
# 18   2  0.993347
# 19   2  1.001337

R的复制和do.call函数在Python

2 个答案: