我正在尝试解析文件夹中的时间序列数据csv文件,该文件的第一列中有时间戳(每个文件名也为时间戳),而定界符为逗号。在每个文件中,唯一的更改是“服务器总计”和“客户端总计”字段(我已在下面编写了csv文件的内容)
如何将所有这些文件合并到一个csv中? 注意:我仅限于使用不同的特定库,只有时间,os,pandas,csv,glob(我尝试使用所有这些库)
我尝试过这个;
import pandas as pd
import glob
path = r'C:\\Users\\xxx\\Documents\\files\\' # use your path
all_files = glob.glob(path + "\*.csv")
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
print(frame)
单个csv文件示例;
TimeStamp,Message/Event,Server Totals,Client Totals
1564981556,INVITE Requests,39967,37787
1564981556,100 Trying ,39896,37758
1564981556,180 Ringing ,1113,1113
1564981556,181 Forwarded ,0,0
1564981556,182 Queued ,1,1
1564981556,183 Progress ,251,251
1564981556,1xx Provisional ,0,0
1564981556,200 OK ,913,913
1564981556,202 Accepted ,0,0
1564981556,2xx Success ,0,0
1564981556,30x Moved ,0,0
1564981556,400 Bad Request ,2,2
1564981556,401 Unauthorized ,252,252
1564981556,403 Forbidden ,320,324
1564981556,404 Not Found ,487,487
1564981556,405 Not Allowed ,0,0
1564981556,406 Not Acceptable ,0,0
1564981556,407 Proxy Auth Req ,998,998
1564981556,408 Request Timeout ,5220,5217
1564981556,415 Bad Media Type ,0,0
1564981556,423 Too Brief ,0,0
1564981556,480 Unavailable ,49,49
1564981556,481 Does Not Exist ,0,0
1564981556,482 Loop Detected ,0,0
1564981556,483 Too Many Hops ,6738,6738
1564981556,484 Address Incompl ,1039,1039
1564981556,485 Ambiguous ,0,0
1564981556,486 Busy Here ,159,174
1564981556,487 Terminated ,2530,2530
1564981556,488 Not Acceptable ,8199,8199
1564981556,489 Bad Event ,0,0
1564981556,491 Req Pending ,0,0
---
TimeStamp,Message/Event,Server Totals,Client Totals
1564982756,INVITE Requests,39967,37787
1564982756,Retransmissions,5,0
1564982756,100 Trying ,39896,37758
1564982756,180 Ringing ,1113,1113
1564982756,181 Forwarded ,0,0
1564982756,182 Queued ,1,1
1564982756,183 Progress ,251,251
1564982756,1xx Provisional ,0,0
1564982756,200 OK ,913,913
1564982756,202 Accepted ,0,0
1564982756,2xx Success ,0,0
1564982756,30x Moved ,0,0
1564982756,305 Use Proxy ,0,0
1564982756,380 Alternative ,0,0
1564982756,3xx Redirect ,0,0
1564982756,400 Bad Request ,2,2
1564982756,401 Unauthorized ,252,252
1564982756,403 Forbidden ,320,324
1564982756,404 Not Found ,487,487
1564982756,405 Not Allowed ,0,0
1564982756,406 Not Acceptable ,0,0
1564982756,407 Proxy Auth Req ,998,998
1564982756,408 Request Timeout ,5220,5217
1564982756,415 Bad Media Type ,0,0
1564982756,420 Bad Extension ,0,0
1564982756,421 Extension Reqd ,0,0
1564982756,422 Too Short ,0,0
1564982756,423 Too Brief ,0,0
1564982756,480 Unavailable ,49,49
1564982756,481 Does Not Exist ,0,0
1564982756,482 Loop Detected ,0,0
1564982756,483 Too Many Hops ,6738,6738
1564982756,484 Address Incompl ,1039,1039
1564982756,485 Ambiguous ,0,0
1564982756,486 Busy Here ,159,174
答案 0 :(得分:0)
我不在Windows下,所以我在Linux和Python 3上尝试了下面的代码段(当然,调整了basepath
),并且它按预期工作了:
import pandas as pd
from pathlib import Path
basepath = Path('C:/Users/xxx/Documents/files')
li = []
for csvfile in basepath.glob('*.csv'):
df = pd.read_csv(csvfile, index_col=None, header=0)
li.append(df)
df = pd.concat(li, axis=0, ignore_index=True)
print(df)
这里是概念证明:
Python 3.7.4 (default, Aug 12 2019, 14:45:07)
[GCC 9.1.1 20190605 (Red Hat 9.1.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> from pathlib import Path
>>> basepath = Path('csvdir')
>>> li = []
>>> for csvfile in basepath.glob('*.csv'):
... df = pd.read_csv(csvfile, index_col=None, header=0)
... li.append(df)
...
>>> df = pd.concat(li, axis=0, ignore_index=True)
>>> print(df)
TimeStamp Message/Event Server Totals Client Totals
0 1564982756 INVITE Requests 39967 37787
1 1564982756 Retransmissions 5 0
2 1564982756 100 Trying 39896 37758
3 1564982756 180 Ringing 1113 1113
4 1564982756 181 Forwarded 0 0
5 1564982756 182 Queued 1 1
6 1564982756 183 Progress 251 251
7 1564982756 1xx Provisional 0 0
8 1564982756 200 OK 913 913
9 1564982756 202 Accepted 0 0
10 1564982756 2xx Success 0 0
11 1564982756 30x Moved 0 0
12 1564982756 305 Use Proxy 0 0
13 1564982756 380 Alternative 0 0
14 1564982756 3xx Redirect 0 0
15 1564982756 400 Bad Request 2 2
16 1564982756 401 Unauthorized 252 252
17 1564982756 403 Forbidden 320 324
18 1564982756 404 Not Found 487 487
19 1564982756 405 Not Allowed 0 0
20 1564982756 406 Not Acceptable 0 0
21 1564982756 407 Proxy Auth Req 998 998
22 1564982756 408 Request Timeout 5220 5217
23 1564982756 415 Bad Media Type 0 0
24 1564982756 420 Bad Extension 0 0
25 1564982756 421 Extension Reqd 0 0
26 1564982756 422 Too Short 0 0
27 1564982756 423 Too Brief 0 0
28 1564982756 480 Unavailable 49 49
29 1564982756 481 Does Not Exist 0 0
.. ... ... ... ...
37 1564981556 180 Ringing 1113 1113
38 1564981556 181 Forwarded 0 0
39 1564981556 182 Queued 1 1
40 1564981556 183 Progress 251 251
41 1564981556 1xx Provisional 0 0
42 1564981556 200 OK 913 913
43 1564981556 202 Accepted 0 0
44 1564981556 2xx Success 0 0
45 1564981556 30x Moved 0 0
46 1564981556 400 Bad Request 2 2
47 1564981556 401 Unauthorized 252 252
48 1564981556 403 Forbidden 320 324
49 1564981556 404 Not Found 487 487
50 1564981556 405 Not Allowed 0 0
51 1564981556 406 Not Acceptable 0 0
52 1564981556 407 Proxy Auth Req 998 998
53 1564981556 408 Request Timeout 5220 5217
54 1564981556 415 Bad Media Type 0 0
55 1564981556 423 Too Brief 0 0
56 1564981556 480 Unavailable 49 49
57 1564981556 481 Does Not Exist 0 0
58 1564981556 482 Loop Detected 0 0
59 1564981556 483 Too Many Hops 6738 6738
60 1564981556 484 Address Incompl 1039 1039
61 1564981556 485 Ambiguous 0 0
62 1564981556 486 Busy Here 159 174
63 1564981556 487 Terminated 2530 2530
64 1564981556 488 Not Acceptable 8199 8199
65 1564981556 489 Bad Event 0 0
66 1564981556 491 Req Pending 0 0
[67 rows x 4 columns]
>>>
如果您也要处理子文件夹中的文件,请将basepath.glob('*.csv')
更改为basepath.glob('**/*.csv')
。
让我知道这是否对您也有用。