下面是我想要实现的一些虚拟代码,我的问题在最后。我想在Python列表中对数据帧(不同大小)的块进行混洗。谢谢。
设置虚拟词典:
dummy = {"ID":[1,2,3,4,5,6,7,8,9,10],
"Alphabet":["A","B","C","D","E","F","G","H","I","J"],
"Fruit":["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]}
将字典转换为数据框:
dummy_df = pd.DataFrame(dummy)
创建所需大小的数据框块:
blocksize = [1,2,3,4]
blocks = []
i = 0
for j in range(len(blocksize)):
a = blocksize[j]
blocks.append(dummy_df[i:i+a])
i += a
blocks
下面是“块”的输出。它是4个数据块,列表中具有1-4行的大小:
[ ID Alphabet Fruit
0 1 A apple,
ID Alphabet Fruit
1 2 B banana
2 3 C coconut,
ID Alphabet Fruit
3 4 D date
4 5 E elephant apple
5 6 F feijoa,
ID Alphabet Fruit
6 7 G guava
7 8 H honeydew
8 9 I ita palm
9 10 J jack fruit]
以上所述我被困住了。
我尝试了许多不同的操作,但是一直出错。我想重新整理列表中的那些数据框,然后将它们组合回一个数据框。下面是改组输出的示例。我该怎么办?
理想输出示例:
ID Alphabet Fruit
1 2 B banana
2 3 C coconut
0 1 A apple
6 7 G guava
7 8 H honeydew
8 9 I ita palm
9 10 J jack fruit
3 4 D date
4 5 E elephant apple
5 6 F feijoa
答案 0 :(得分:0)
获得列表后,您可以使用random.shuffle
来随机排列块。之后,您可以创建一个新的空数据框,然后在(随机)列表中附加每个块。
尝试以下代码:
import pandas as pd
import random
dummy = {"ID":[1,2,3,4,5,6,7,8,9,10],
"Alphabet":["A","B","C","D","E","F","G","H","I","J"],
"Fruit":["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]}
dummy_df = pd.DataFrame(dummy)
blocksize = [1,2,3,4]
blocks = []
i = 0
for j in range(len(blocksize)):
a = blocksize[j]
blocks.append(dummy_df[i:i+a])
i += a
random.shuffle(blocks) # shuffle blocks in list
dfs = pd.DataFrame() # new empty dataframe
for b in blocks: # each block
dfs = dfs.append(b) # add to dataframe
print(dfs)
输出
ID Alphabet Fruit
3 4 D date
4 5 E elephant apple
5 6 F feijoa
1 2 B banana
2 3 C coconut
6 7 G guava
7 8 H honeydew
8 9 I ita palm
9 10 J jack fruit
0 1 A apple
答案 1 :(得分:0)
您可以使用.sample(frac=1)
直接在数据帧中随机播放数据
blocks.append( df[start:end].sample(frac=1) )
然后您可以使用df.append(list_of_df)
一次加入所有dataframes
。
df = blocks[0].append(blocks[1:])
import pandas as pd
dummy = {
"ID": [1,2,3,4,5,6,7,8,9,10],
"Alphabet": ["A","B","C","D","E","F","G","H","I","J"],
"Fruit": ["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]
}
df = pd.DataFrame(dummy)
blocksize = [1,2,3,4]
blocks = []
start = 0
for size in blocksize:
end = start + size
blocks.append(df[start:end].sample(frac=1))
start = end
#for item in blocks:
# print(item)
df = blocks[0].append(blocks[1:]) # .reset_index(drop=True)
print(df)
其他洗牌方法:Shuffle DataFrame rows
另一种想法是使用.sample(frac=1)
blocks += df[start:end].sample(frac=1).index.tolist()
或random.shuffle()
indexes = df[start:end].index.tolist()
random.shuffle(indexes)
blocks += indexes
,然后使用这些索引创建新的DataFrame
df = df.iloc[blocks]
import pandas as pd
import random
dummy = {
"ID": [1,2,3,4,5,6,7,8,9,10],
"Alphabet": ["A","B","C","D","E","F","G","H","I","J"],
"Fruit": ["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]
}
df = pd.DataFrame(dummy)
blocksize = [1,2,3,4]
blocks = []
start = 0
for size in blocksize:
end = start + size
#blocks += df[start:end].sample(frac=1).index.tolist()
indexes = df[start:end].index.tolist()
random.shuffle(indexes)
blocks += indexes
start = end
#for item in blocks:
# print(item)
df = df.iloc[blocks]
print(df)