使用时间戳解析文件夹中的多个csv文件

时间:2019-10-04 13:36:09

标签: python csv time-series

我正在尝试解析文件夹中的时间序列数据csv文件,该文件的第一列中有时间戳(每个文件名也为时间戳),而定界符为逗号。在每个文件中,唯一的更改是“服务器总计”和“客户端总计”字段(我已在下面编写了csv文件的内容)

如何将所有这些文件合并到一个csv中? 注意:我仅限于使用不同的特定库,只有时间,os,pandas,csv,glob(我尝试使用所有这些库)

我尝试过这个;

import pandas as pd
import glob

path = r'C:\\Users\\xxx\\Documents\\files\\' # use your path

all_files = glob.glob(path + "\*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

print(frame)

单个csv文件示例;

TimeStamp,Message/Event,Server Totals,Client Totals
1564981556,INVITE Requests,39967,37787
1564981556,100 Trying          ,39896,37758
1564981556,180 Ringing         ,1113,1113
1564981556,181 Forwarded       ,0,0
1564981556,182 Queued          ,1,1
1564981556,183 Progress        ,251,251
1564981556,1xx Provisional     ,0,0
1564981556,200 OK              ,913,913
1564981556,202 Accepted        ,0,0
1564981556,2xx Success         ,0,0
1564981556,30x Moved           ,0,0
1564981556,400 Bad Request     ,2,2
1564981556,401 Unauthorized    ,252,252
1564981556,403 Forbidden       ,320,324
1564981556,404 Not Found       ,487,487
1564981556,405 Not Allowed     ,0,0
1564981556,406 Not Acceptable  ,0,0
1564981556,407 Proxy Auth Req  ,998,998
1564981556,408 Request Timeout ,5220,5217
1564981556,415 Bad Media Type  ,0,0
1564981556,423 Too Brief       ,0,0
1564981556,480 Unavailable     ,49,49
1564981556,481 Does Not Exist  ,0,0
1564981556,482 Loop Detected   ,0,0
1564981556,483 Too Many Hops   ,6738,6738
1564981556,484 Address Incompl ,1039,1039
1564981556,485 Ambiguous       ,0,0
1564981556,486 Busy Here       ,159,174
1564981556,487 Terminated      ,2530,2530
1564981556,488 Not Acceptable  ,8199,8199
1564981556,489 Bad Event       ,0,0
1564981556,491 Req Pending     ,0,0

---


TimeStamp,Message/Event,Server Totals,Client Totals
1564982756,INVITE Requests,39967,37787
1564982756,Retransmissions,5,0
1564982756,100 Trying          ,39896,37758
1564982756,180 Ringing         ,1113,1113
1564982756,181 Forwarded       ,0,0
1564982756,182 Queued          ,1,1
1564982756,183 Progress        ,251,251
1564982756,1xx Provisional     ,0,0
1564982756,200 OK              ,913,913
1564982756,202 Accepted        ,0,0
1564982756,2xx Success         ,0,0
1564982756,30x Moved           ,0,0
1564982756,305 Use Proxy       ,0,0
1564982756,380 Alternative     ,0,0
1564982756,3xx Redirect        ,0,0
1564982756,400 Bad Request     ,2,2
1564982756,401 Unauthorized    ,252,252
1564982756,403 Forbidden       ,320,324
1564982756,404 Not Found       ,487,487
1564982756,405 Not Allowed     ,0,0
1564982756,406 Not Acceptable  ,0,0
1564982756,407 Proxy Auth Req  ,998,998
1564982756,408 Request Timeout ,5220,5217
1564982756,415 Bad Media Type  ,0,0
1564982756,420 Bad Extension   ,0,0
1564982756,421 Extension Reqd  ,0,0
1564982756,422 Too Short       ,0,0
1564982756,423 Too Brief       ,0,0
1564982756,480 Unavailable     ,49,49
1564982756,481 Does Not Exist  ,0,0
1564982756,482 Loop Detected   ,0,0
1564982756,483 Too Many Hops   ,6738,6738
1564982756,484 Address Incompl ,1039,1039
1564982756,485 Ambiguous       ,0,0
1564982756,486 Busy Here       ,159,174



1 个答案:

答案 0 :(得分:0)

我不在Windows下,所以我在Linux和Python 3上尝试了下面的代码段(当然,调整了basepath),并且它按预期工作了:

import pandas as pd
from pathlib import Path

basepath = Path('C:/Users/xxx/Documents/files')
li = []

for csvfile in basepath.glob('*.csv'):
    df = pd.read_csv(csvfile, index_col=None, header=0)
    li.append(df)

df = pd.concat(li, axis=0, ignore_index=True)

print(df)

这里是概念证明:

Python 3.7.4 (default, Aug 12 2019, 14:45:07) 
[GCC 9.1.1 20190605 (Red Hat 9.1.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> from pathlib import Path
>>> basepath = Path('csvdir')
>>> li = []
>>> for csvfile in basepath.glob('*.csv'):
...     df = pd.read_csv(csvfile, index_col=None, header=0)
...     li.append(df)
... 
>>> df = pd.concat(li, axis=0, ignore_index=True)
>>> print(df)
     TimeStamp         Message/Event  Server Totals  Client Totals
0   1564982756       INVITE Requests          39967          37787
1   1564982756       Retransmissions              5              0
2   1564982756  100 Trying                    39896          37758
3   1564982756  180 Ringing                    1113           1113
4   1564982756  181 Forwarded                     0              0
5   1564982756  182 Queued                        1              1
6   1564982756  183 Progress                    251            251
7   1564982756  1xx Provisional                   0              0
8   1564982756  200 OK                          913            913
9   1564982756  202 Accepted                      0              0
10  1564982756  2xx Success                       0              0
11  1564982756  30x Moved                         0              0
12  1564982756  305 Use Proxy                     0              0
13  1564982756  380 Alternative                   0              0
14  1564982756  3xx Redirect                      0              0
15  1564982756  400 Bad Request                   2              2
16  1564982756  401 Unauthorized                252            252
17  1564982756  403 Forbidden                   320            324
18  1564982756  404 Not Found                   487            487
19  1564982756  405 Not Allowed                   0              0
20  1564982756  406 Not Acceptable                0              0
21  1564982756  407 Proxy Auth Req              998            998
22  1564982756  408 Request Timeout            5220           5217
23  1564982756  415 Bad Media Type                0              0
24  1564982756  420 Bad Extension                 0              0
25  1564982756  421 Extension Reqd                0              0
26  1564982756  422 Too Short                     0              0
27  1564982756  423 Too Brief                     0              0
28  1564982756  480 Unavailable                  49             49
29  1564982756  481 Does Not Exist                0              0
..         ...                   ...            ...            ...
37  1564981556  180 Ringing                    1113           1113
38  1564981556  181 Forwarded                     0              0
39  1564981556  182 Queued                        1              1
40  1564981556  183 Progress                    251            251
41  1564981556  1xx Provisional                   0              0
42  1564981556  200 OK                          913            913
43  1564981556  202 Accepted                      0              0
44  1564981556  2xx Success                       0              0
45  1564981556  30x Moved                         0              0
46  1564981556  400 Bad Request                   2              2
47  1564981556  401 Unauthorized                252            252
48  1564981556  403 Forbidden                   320            324
49  1564981556  404 Not Found                   487            487
50  1564981556  405 Not Allowed                   0              0
51  1564981556  406 Not Acceptable                0              0
52  1564981556  407 Proxy Auth Req              998            998
53  1564981556  408 Request Timeout            5220           5217
54  1564981556  415 Bad Media Type                0              0
55  1564981556  423 Too Brief                     0              0
56  1564981556  480 Unavailable                  49             49
57  1564981556  481 Does Not Exist                0              0
58  1564981556  482 Loop Detected                 0              0
59  1564981556  483 Too Many Hops              6738           6738
60  1564981556  484 Address Incompl            1039           1039
61  1564981556  485 Ambiguous                     0              0
62  1564981556  486 Busy Here                   159            174
63  1564981556  487 Terminated                 2530           2530
64  1564981556  488 Not Acceptable             8199           8199
65  1564981556  489 Bad Event                     0              0
66  1564981556  491 Req Pending                   0              0

[67 rows x 4 columns]
>>> 

如果您也要处理子文件夹中的文件,请将basepath.glob('*.csv')更改为basepath.glob('**/*.csv')

让我知道这是否对您也有用。