Question

我有一个具有以下形状的熊猫数据框：

>> dataset.shape
(1942,28)

我想创建一个新的数据框df_new，在其中取dataset的列名，并使其在我的df_new中重复值。

这是一个示例：

>> dataset.columns
['a', 'b', 'c', 'd']

我希望我的df_new看起来像：

    column_name 
 0       a
 1       b
 2       c
 3       d
 4       a
 5       b
 6       c
 7       d
 8       a
 9       b
 10      c
 11      d
 .      .
 .      . 
(until the length of the array)

当前，当我编写下面的代码时，我没有得到想要的答案。

>> df_new = pd.DataFrame({0:np.arange(0,28).repeat(dataset_ts.shape[1])})
      0
0     0
1     0
2     0
.     .
.     . 
.     .
27    0
28    1
29    1
30    1
.     .
.     .

Answer 1

使用numpy.tile：

cols = dataset.columns            
length = dataset_ts.shape[0]      

df_new = pd.DataFrame({'new': np.tile(cols, length)})
print (df_new)
   new
0    a
1    b
2    c
3    d
4    a
5    b
6    c
7    d
8    a
9    b
10   c
...
...
...

Answer 2

您可以使用itertools.cycle + itertools.islice：

import pandas as pd

from itertools import cycle, islice

length = 1942
data = ['a', 'b', 'c', 'd']

result = pd.DataFrame({'new': list(islice(cycle(data), length))})

print(result)

输出

     new
0      a
1      b
2      c
3      d
4      a
...   ..
1937   b
1938   c
1939   d
1940   a
1941   b

[1942 rows x 1 columns]

作为替代方案，您可以使用zip + range +在列表理解中循环：

result = pd.DataFrame({'new': [e for _, e in zip(range(length), cycle(data))] })

如何填充大熊猫中的重复值？

2 个答案: