假设我们将数据框设置如下:
df1 = pd.DataFrame(np.random.randint(0, 2, (10, 2)), columns=['Cow', 'Sheep'])
df2 = pd.DataFrame(np.random.randint(0, 2, (10, 5)), columns=['Hungry', 'Scared', 'Happy', 'Bored', 'Sad'])
df3 = pd.DataFrame(np.random.randint(0, 2, (10, 2)), columns=['Davids', 'Michaels'])
df1.index.name = df2.index.name = df3.index.name = 'id'
combos_to_test = pd.DataFrame([('Davids', 'Cow', 'Hungry'),
('Michaels', 'Cow', 'Hungry'),
('Davids', 'Cow', 'Scared'),
('Michaels', 'Cow', 'Scared'),
('Michaels', 'Sheep', 'Scared'),
('Davids', 'Sheep', 'Happy'),
('Michaels', 'Sheep', 'Happy'),])
示例:
DF1: DF2: DF3:
id Cow Sheep id Hungry Scared Happy Bored Sad id Davids Michaels
0 0 1 0 0 1 1 0 1 0 1 0
1 0 0 1 1 0 0 1 1 1 0 1
2 0 0 2 1 0 0 1 1 2 0 0
3 1 0 3 0 0 1 0 1 3 0 1
4 1 0 4 0 0 1 1 0 4 0 1
5 1 1 5 0 0 1 1 0 5 1 0
6 1 1 6 1 0 1 1 0 6 1 0
7 1 0 7 1 1 1 1 0 7 1 1
8 1 1 8 1 1 1 1 0 8 1 0
9 1 0 9 0 1 1 0 0 9 1 0
我需要第4个数据帧,当每个combos_to_test
为列时,它会找到(对于每个组合)。
我计划这样做的方法是将列更改为:
df1.columns = Cow, Cow, Cow, Cow, Sheep, Sheep, Sheep
df2.columns = Hungry, Hungry, Scared, Scared, Happy, Happy
df3.columns = David, Michael, David, Michael, Michael, David, Michael
然后将所有cols重命名为col1, col2, col3, ..., col8
然后将每个数据帧相乘(它将向量化它 - 但需要大量内存)。
我的数据集显然要大得多,并且会使用numpy / pandas。
输出df应如下所示:
('Davids', 'Cow', 'Hungry') | ('Michaels', 'Cow', 'Hungry') | ('Davids', 'Cow', 'Scared') | ('Michaels', 'Cow', 'Scared') | ...
1) 0 1 0 0
2) 0 0 0 0
3) 0 1 0 0
4) 0 0 1 0
5) 0 0 0 0
6) 0 0 0 0
7) 0 0 0 0
8) 0 0 1 1
9) 1 0 0 0
10) 1 0 0 0
答案 0 :(得分:3)
我可以使用pd.concat
df = pd.concat([df1, df2, df3], axis=1)
pd.concat({
ctt: df.reindex(columns=ctt).prod(1)
for ctt in map(tuple, combos_to_test.values)
}, axis=1)
Davids Michaels
Cow Sheep Cow Sheep
Hungry Scared Happy Hungry Scared Happy Scared
id
0 0 0 0 0 0 0 0
1 1 1 0 1 1 0 1
2 0 0 0 0 0 0 0
3 0 0 0 0 0 1 0
4 1 1 0 0 0 0 0
5 0 0 0 1 1 1 1
6 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0
答案 1 :(得分:0)
复制列的最简单方法是使用:
df1['Cow_copy'] = df1['Cow']
如果要复制许多列,可以创建列列表并循环遍历它并使用上面的代码为每个列。