绕过pandas concate error"重新索引仅对具有唯一值的Index对象有效#34;

时间:2017-07-25 05:15:22

标签: python pandas dataframe concatenation

我有3个数据框,包括来自同一组的信息,现在我尝试$ sudo tcpdump -nnvs0 -I -i en0 -w output.pcap $ ifconfig ... en0: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 ether 60:03:08:a5:fa:0c inet 192.168.1.33 netmask 0xffffff00 broadcast 192.168.1.255 media: autoselect status: active 这些数据框的组,concate作为组名,但因为{{1}包含不唯一的索引,因此我无法set_index它们。有没有办法绕过它?

输入样本df:

df1

想要输出:

concate

我的代码:

df1:
group     A       B
 cat      1       0 
 cat      2       7
 cat      5       5
 dog      0.4     1
 dog      2       4
 dog      8       7 
 seal     7       5
 seal     1       8
 seal     7       9

df2:
group     C       D
 cat      1       3
 seal     0       5    
 dog      3       4

df3:
group     E       F
 cat      1       5
 dog      0       3 
 seal     5       9

错误:

group     A       B       C        D       E      F
 cat      1       0       1        3       1      5
 cat      2       7       1        3       1      5
 cat      5       5       1        3       1      5
 dog      0.4     1       3        4       0      3
 dog      2       4       3        4       0      3
 dog      8       7       3        4       0      3 
 seal     7       5       0        5       5      9
 seal     1       8       0        5       5      9
 seal     7       9       0        5       5      9

谢谢!

1 个答案:

答案 0 :(得分:1)

我认为如果相同尺寸,则可以先使用df2的{​​{3}}和df3,然后使用concat

df = pd.concat([df2.set_index('group'), df3.set_index('group')], axis = 1)
all_data = df1.join(df, on='group')
print (all_data)
  group    A  B  C  D  E  F
0   cat  1.0  0  1  3  1  5
1   cat  2.0  7  1  3  1  5
2   cat  5.0  5  1  3  1  5
3   dog  0.4  1  3  4  0  3
4   dog  2.0  4  3  4  0  3
5   dog  8.0  7  3  4  0  3
6  seal  7.0  5  0  5  5  9
7  seal  1.0  8  0  5  5  9
8  seal  7.0  9  0  5  5  9

也可以在join中使用参数index_col代替read_csv

df1 = pd.read(file)
df2 = pd.read(file, index_col='group')
df3 = pd.read(file, index_col='group')

df = pd.concat([df2, df3], axis = 1)
all_data = df1.join(df, on='group')