基于多列值的组合选择不同的pandas数据帧

时间:2018-06-13 19:09:35

标签: python pandas

基于多列值的组合选择不同的pandas数据帧。

我有一个数据:

    Time             locIP          remIp locPort remPort   numReads numWrites
0   20180529235221  127.0.0.1   127.0.0.1   22  565 36736   36751
1   20180529235221  127.0.0.1   127.0.0.1   22  566 36736   74690
2   20180529235221  127.0.0.1   127.0.0.1   12  567 36736   36749
3   20180529235221  10.8.21.41  10.8.21.34  22  565 36744   36738
4   20180529235221  10.8.21.41  10.8.21.34  22  566 36744   36738
5   20180529235225  127.0.0.1   127.0.0.1   22  565 36788   36751
6   20180529235225  127.0.0.1   127.0.0.1   22  566 36788   74700
7   20180529235225  127.0.0.1   127.0.0.1   12  567 36788   36800

我想为(locIP,remIP,LocPort remPort)和numReads的每个组合绘制时间序列图。

为此,我正在寻找不同的小型数据框,如:

    Time            locIP       remIp   locPort remPort numReads    numWrites
0   20180529235221  127.0.0.1   127.0.0.1   22  565 36736   36751
5   20180529235225  127.0.0.1   127.0.0.1   22  565 36736   36751

另一个:

Time             locIP        remIp  locPort    remPort  numReads   numWrites
20180529235221  127.0.0.1   127.0.0.1   22  566 36736   74690
20180529235225  127.0.0.1   127.0.0.1   22  566 36788   74700

我在多个栏目上尝试了条件:

df1 =df[(df["locIP"] =='127.0.0.1') & (df["remIp"] == '127.0.0.1') & (df['locPort']== '22') & (df['remPort']=='565')]

但是我必须在条件变量中提取所有组合。寻找更好的方式。

1 个答案:

答案 0 :(得分:0)

这可能适合你。

import itertools
#Create a dictionary to populate with a collection of unique values.
d = {}
#Grab header list 
head = list(df)
#Create a collection of unique values 
for x in head:
     d[x] = list(set(df[x]))
#Create all possible combinations
c = list(itertools.product(d['locIP'],d['locPort'],d['remIp'],d['remPort']))
#Create list to populate with selected dataframes
NonEmpdf =[]
for x in c:
     selectTxt = 'locIP == {} & locPort == {} & remIp == {} & remPort == {}'.format("'"+x[0]+"'",x[1],"'"+x[2]+"'",x[3])
     print selectTxt
     dfSel = df.query(selectTxt)
     if dfSel.empty:
         print 'Empty'
     else:
         NonEmpdf.append(dfSel)
#Then this is a collection of all non-empty dataframes you can iterate through and plot.
NonEmpdf

另外.any()可能对您有用。