拆分.csv为每个新的.csv

时间:2017-08-16 20:55:00

标签: python-3.x pandas csv

我使用以下代码拆分.csv文件取决于主.csv第8列:

import csv
import pandas as pd    

def spliteCsv(input,output):
    print(input)
    data=set()
    with open (input) as csvfile:
        file = csv.reader (csvfile,delimiter=',')
        next (file,None)
        for row in file:
            if row[7] =='':
                data.add (-1)
            else:
                data.add (int(row[7]))

    data = list(data)
    ofile = pd.read_csv (input, sep=',')
    data.append(max(data)+1)
    for d in data:
        csv_temp = ofile[ofile['col8'].fillna (max(data)).astype(int) == d]
        csv_temp.to_csv ('%s_%s.csv'%(output,d),sep=',')
    return 

这就是我需要的:

col1  col2  col3  col4  col5  col6  col7  col8  col9 
1     a     k8                            5 
2     j     l9                            5
3     k     o0                            5
4     l     m7                            5

这是代码输出:

col0  col1  col2  col3  col4  col5  col6  col7  col8  col9 
0     1     a     k8                            5 
1     2     j     l9                            5
2     3     k     o0                            5
3     4     l     m7                            5

如您所知,它将附加列作为第一列插入,其中包含value(col1) - 1

编辑:

source.csv:

frame.number    frame.time_epoch        ip.src          ip.dst      tcp.srcport     tcp.dstport     tcp.seq     tcp.stream      frame.len       tcp.flags       _ws.col.Info
    1           1501756607          192.168.1.10    37.48.64.201        47159           7095        1               1           215             0x00000018      47159 → 7095 [PSH, ACK] Seq=1 Ack=1 Win=2235 Len=149 TSval=19928932 TSecr=2777283254
    2           1501756607          37.48.64.201    192.168.1.10        7095            47159       1               2           66              0x00000010      7095 → 47159 [ACK] Seq=1 Ack=150 Win=91 Len=0 TSval=2777285491 TSecr=19928932
    3           1501756607          37.48.64.201    192.168.1.10        7095            47159       1               1           215             0x00000018      7095 → 47159 [PSH, ACK] Seq=1 Ack=150 Win=91 Len=149 TSval=2777285491 TSecr=19928932
    4           1501756607          192.168.1.10    37.48.64.201        47159           7095        150             2           215             0x00000018      47159 → 7095 [PSH, ACK] Seq=150 Ack=150 Win=2235 Len=149 TSval=19928977 TSecr=2777285491
    5           1501756607          192.168.1.10    37.48.64.201        47159           7095        299             2           343             0x00000018      47159 → 7095 [PSH, ACK] Seq=299 Ack=150 Win=2235 Len=277 TSval=19928979 TSecr=2777285491
    6           1501756607          37.48.64.201    192.168.1.10        7095            47159       150                         66              0x00000010      7095 → 47159 [ACK] Seq=150 Ack=576 Win=91 Len=0 TSval=2777285537 TSecr=19928977

输出文件:

文件1:

frame.number    frame.time_epoch        ip.src          ip.dst      tcp.srcport     tcp.dstport     tcp.seq     tcp.stream      frame.len       tcp.flags       _ws.col.Info
    1           1501756607          192.168.1.10    37.48.64.201        47159           7095        1               1           215             0x00000018      47159 → 7095 [PSH, ACK] Seq=1 Ack=1 Win=2235 Len=149 TSval=19928932 TSecr=2777283254
    3           1501756607          37.48.64.201    192.168.1.10        7095            47159       1               1           215             0x00000018      7095 → 47159 [PSH, ACK] Seq=1 Ack=150 Win=91 Len=149 TSval=2777285491 TSecr=19928932

文件2:

frame.number    frame.time_epoch        ip.src          ip.dst      tcp.srcport     tcp.dstport     tcp.seq     tcp.stream      frame.len       tcp.flags       _ws.col.Info
    2           1501756607          37.48.64.201    192.168.1.10        7095            47159       1               2           66              0x00000010      7095 → 47159 [ACK] Seq=1 Ack=150 Win=91 Len=0 TSval=2777285491 TSecr=19928932
    4           1501756607          192.168.1.10    37.48.64.201        47159           7095        150             2           215             0x00000018      47159 → 7095 [PSH, ACK] Seq=150 Ack=150 Win=2235 Len=149 TSval=19928977 TSecr=2777285491
    5           1501756607          192.168.1.10    37.48.64.201        47159           7095        299             2           343             0x00000018      47159 → 7095 [PSH, ACK] Seq=299 Ack=150 Win=2235 Len=277 TSval=19928979 TSecr=2777285491

文件3:

frame.number    frame.time_epoch        ip.src          ip.dst      tcp.srcport     tcp.dstport     tcp.seq     tcp.stream      frame.len       tcp.flags       _ws.col.Info
    6           1501756607          37.48.64.201    192.168.1.10        7095            47159       150             3           66              0x00000010      7095 → 47159 [ACK] Seq=150 Ack=576 Win=91 Len=0 TSval=2777285537 TSecr=19928977

1 个答案:

答案 0 :(得分:2)

使用index=False参数:

csv_temp.to_csv ('%s_%s.csv'%(output,d),sep=',', index=False)
# NOTE:                                          ^^^^^^^^^^^

<强>更新

df = pd.read_csv('/path/to/source/file.csv')

df['tcp.stream'] = pd.to_numeric(df['tcp.stream'], errors='coerce').fillna(-1)

# please set desired path and file name in the next line 
output_path_template = 'd:/temp/tcp.stream.{}.csv'

df.groupby('tcp.stream') \
  .apply(lambda x: x.to_csv(output_path_template.format(x.name), index=False))