如何在Python列表中随机播放数据帧块(不同大小)?

时间:2020-08-25 17:35:40

标签: python list dataframe shuffle

下面是我想要实现的一些虚拟代码,我的问题在最后。我想在Python列表中对数据帧(不同大小)的块进行混洗。谢谢。

设置虚拟词典:

dummy = {"ID":[1,2,3,4,5,6,7,8,9,10],
         "Alphabet":["A","B","C","D","E","F","G","H","I","J"],
         "Fruit":["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]}

将字典转换为数据框:

dummy_df = pd.DataFrame(dummy)

创建所需大小的数据框块:

blocksize = [1,2,3,4]
blocks = []
i = 0
for j in range(len(blocksize)):
    a = blocksize[j]
    blocks.append(dummy_df[i:i+a])
    i += a
blocks

下面是“块”的输出。它是4个数据块,列表中具有1-4行的大小:

[   ID Alphabet  Fruit
 0   1        A  apple,    
ID Alphabet    Fruit
 1   2        B   banana
 2   3        C  coconut,    
ID Alphabet           Fruit
 3   4        D            date
 4   5        E  elephant apple
 5   6        F          feijoa,    
ID Alphabet       Fruit
 6   7        G       guava
 7   8        H    honeydew
 8   9        I    ita palm
 9  10        J  jack fruit]

以上所述我被困住了。

我尝试了许多不同的操作,但是一直出错。我想重新整理列表中的那些数据框,然后将它们组合回一个数据框。下面是改组输出的示例。我该怎么办?

理想输出示例:

    ID  Alphabet    Fruit
1   2   B   banana
2   3   C   coconut
0   1   A   apple
6   7   G   guava
7   8   H   honeydew
8   9   I   ita palm
9   10  J   jack fruit
3   4   D   date
4   5   E   elephant apple
5   6   F   feijoa

2 个答案:

答案 0 :(得分:0)

获得列表后,您可以使用random.shuffle来随机排列块。之后,您可以创建一个新的空数据框,然后在(随机)列表中附加每个块。

尝试以下代码:

import pandas as pd
import random

dummy = {"ID":[1,2,3,4,5,6,7,8,9,10],
         "Alphabet":["A","B","C","D","E","F","G","H","I","J"],
         "Fruit":["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]}

dummy_df = pd.DataFrame(dummy)

blocksize = [1,2,3,4]
blocks = []
i = 0
for j in range(len(blocksize)):
    a = blocksize[j]
    blocks.append(dummy_df[i:i+a])
    i += a

random.shuffle(blocks)  # shuffle blocks in list

dfs = pd.DataFrame()  # new empty dataframe

for b in blocks: # each block 
   dfs = dfs.append(b) # add to dataframe
   
print(dfs)

输出

   ID Alphabet           Fruit
3   4        D            date
4   5        E  elephant apple
5   6        F          feijoa
1   2        B          banana
2   3        C         coconut
6   7        G           guava
7   8        H        honeydew
8   9        I        ita palm
9  10        J      jack fruit
0   1        A           apple

答案 1 :(得分:0)

您可以使用.sample(frac=1)直接在数据帧中随机播放数据

blocks.append( df[start:end].sample(frac=1) )

然后您可以使用df.append(list_of_df)一次加入所有dataframes

df = blocks[0].append(blocks[1:])

import pandas as pd

dummy = {
    "ID": [1,2,3,4,5,6,7,8,9,10],
    "Alphabet": ["A","B","C","D","E","F","G","H","I","J"],
    "Fruit": ["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]
}

df = pd.DataFrame(dummy)

blocksize = [1,2,3,4]
blocks = []

start = 0
for size in blocksize:
    end = start + size
    blocks.append(df[start:end].sample(frac=1))
    start = end

#for item in blocks:
#    print(item)

df = blocks[0].append(blocks[1:]) # .reset_index(drop=True)
print(df)

其他洗牌方法:Shuffle DataFrame rows

文档:pandas.DataFrame.sample


另一种想法是使用.sample(frac=1)

仅获得经过改组的索引
blocks += df[start:end].sample(frac=1).index.tolist()

random.shuffle()

indexes = df[start:end].index.tolist()
random.shuffle(indexes)
blocks += indexes

,然后使用这些索引创建新的DataFrame

df = df.iloc[blocks]

import pandas as pd
import random

dummy = {
    "ID": [1,2,3,4,5,6,7,8,9,10],
    "Alphabet": ["A","B","C","D","E","F","G","H","I","J"],
    "Fruit": ["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]
}

df = pd.DataFrame(dummy)

blocksize = [1,2,3,4]
blocks = []

start = 0
for size in blocksize:
    end = start + size

    #blocks += df[start:end].sample(frac=1).index.tolist()
   
    indexes = df[start:end].index.tolist()
    random.shuffle(indexes)
    blocks += indexes
    
    start = end

#for item in blocks:
#    print(item)

df = df.iloc[blocks]

print(df)