我正在使用q
转换我的csv文件:log.csv(文件已链接)。
它的格式是:
datapath,port,rxpkts,rxbytes,rxerror,txpkts,txbytes,txerror
4,1,178,25159,0,40,3148,0
4,2,3,230,0,213,27897,0
4,3,3,230,0,212,27807,0
4,4,4,320,0,211,27717,0
4,5,3,230,0,212,27807,0
4,6,3,230,0,212,27807,0
4,7,4,320,0,211,27717,0
4,8,4,320,0,211,27717,0
4,9,4,320,0,211,27717,0
4,a,4,320,0,211,27717,0
4,b,3,230,0,212,27807,0
4,fffffffe,7,578,0,209,27549,0
3,1,197,26863,0,21,1638,0
3,2,3,230,0,215,28271,0
3,3,5,390,0,215,28271,0
3,4,2,140,0,216,28361,0
3,5,4,320,0,214,28181,0
3,6,3,230,0,215,28271,0
3,fffffffe,7,578,0,212,28013,0
5,1,208,27401,0,6,488,0
5,fffffffe,7,578,0,208,27401,0
2,1,180,24228,0,18,1368,0
2,2,2,140,0,195,25366,0
2,3,2,140,0,195,25366,0
2,4,3,230,0,194,25276,0
2,5,3,230,0,194,25276,0
2,6,2,140,0,195,25366,0
2,fffffffe,7,578,0,191,25018,0
1,1,38,5096,0,182,23602,0
1,2,42,5419,0,179,23369,0
1,3,61,7152,0,159,21546,0
1,4,28,4611,0,192,24087,0
1,5,46,6022,0,174,22676,0
1,fffffffe,7,578,0,214,28210,0
我想将其转换为以下格式:
端口数量可能会有所不同。
当前代码:
python q -H -d "," "select rxpkts, txpkts from ./log.csv where datapath = i and port = j" > i_j.csv;
所以我创建了i*j
个文件,然后手动组合它们。是否可以通过修改上述 sql查询或使用python 组合文件或使用 pandas 按照评论中的建议一次性完成此操作?
import subprocess
def printit():
for i in range(1,6):
for j in range(1,6):
query = "select rxpkts, txpkts from ./log.csv where datapath = "+str(i)+" and port = "+str(j)
fileName = str(i)+"_"+str(j)+".csv"
with open(fileName, "w+") as f:
p = subprocess.Popen(["python", "q", "-H", "-d", ",", query], stdout=f)
printit()
答案 0 :(得分:1)
您可以将set_index
与stack
一起使用。
import pandas as pd
# your data
# ======================================
print(df)
datapath port rxpkts ... txpkts txbytes txerror
0 4 1 178 ... 40 3148 0
1 4 2 3 ... 213 27897 0
2 4 3 3 ... 212 27807 0
3 4 4 4 ... 211 27717 0
4 4 5 3 ... 212 27807 0
5 4 6 3 ... 212 27807 0
6 4 7 4 ... 211 27717 0
7 4 8 4 ... 211 27717 0
8 4 9 4 ... 211 27717 0
9 4 a 4 ... 211 27717 0
.. ... ... ... ... ... ... ...
24 2 4 3 ... 194 25276 0
25 2 5 3 ... 194 25276 0
26 2 6 2 ... 195 25366 0
27 2 fffffffe 7 ... 191 25018 0
28 1 1 38 ... 182 23602 0
29 1 2 42 ... 179 23369 0
30 1 3 61 ... 159 21546 0
31 1 4 28 ... 192 24087 0
32 1 5 46 ... 174 22676 0
33 1 fffffffe 7 ... 214 28210 0
[34 rows x 8 columns]
# reshaping
# ======================================
series_res = df[df.columns[:4]].set_index(['datapath', 'port']).stack()
series_res.name = 'value'
datapath port
4 1 rxpkts 178
rxbytes 25159
2 rxpkts 3
rxbytes 230
3 rxpkts 3
rxbytes 230
4 rxpkts 4
rxbytes 320
5 rxpkts 3
rxbytes 230
...
1 2 rxpkts 42
rxbytes 5419
3 rxpkts 61
rxbytes 7152
4 rxpkts 28
rxbytes 4611
5 rxpkts 46
rxbytes 6022
fffffffe rxpkts 7
rxbytes 578
Name: value, dtype: int64
df_res = pd.DataFrame(series_res)
df_res.T
datapath 4 ... 1
port 1 2 3 ... 4 5 fffffffe
rxpkts rxbytes rxpkts rxbytes rxpkts rxbytes ... rxpkts rxbytes rxpkts rxbytes rxpkts rxbytes
value 178 25159 3 230 3 230 ... 28 4611 46 6022 7 578
[1 rows x 68 columns]