我有一个文件夹trip_data
包含许多带日期的csv文件,如下所示:
trip_data/
├── df_trip_20140803_1.csv
├── df_trip_20140803_2.csv
├── df_trip_20140803_3.csv
├── df_trip_20140803_4.csv
├── df_trip_20140803_5.csv
├── df_trip_20140803_6.csv
├── df_trip_20140804_1.csv
├── df_trip_20140804_2.csv
├── df_trip_20140804_3.csv
├── df_trip_20140804_4.csv
├── df_trip_20140804_5.csv
├── df_trip_20140804_6.csv
├── df_trip_20140805_1.csv
├── df_trip_20140805_2.csv
├── df_trip_20140805_3.csv
├── df_trip_20140805_4.csv
├── df_trip_20140805_5.csv
├── df_trip_20140805_6.csv
├── df_trip_20140806_1.csv
├── df_trip_20140806_2.csv
├── df_trip_20140806_3.csv
├── df_trip_20140806_4.csv
现在我想用python pandas按日期分别加载所有这些文件,意味着4 DataFrame df_traip_20140803, df_traip_20140804, df_traip_20140805, df_traip_20140806
我的代码如下所示:
days = [20140803,20140804,20140805,20140806]
for day in days:
## Locate to the path
path ='./trip_data/df_trip_%d*.csv' % day
df = pd.read_csv(path, header=None, nrows=10,
names=['ID','lat','lon','status','timestamp'])
无法获得正确的结果。我怎么能这样做?
答案 0 :(得分:3)
我会将所有这些CSV收集到DataFrames的字典中,结构如下:
df['20140803']
- 包含属于所有df_trip_20140803_*.csv
个CSV文件的连锁数据的DF。
<强>解决方案:强>
import os
import re
import glob
import pandas as pd
fpattern = r'D:\temp\.data\41444939\df_trip_{}_{}.csv'
files = glob.glob(fpattern.format('*','*'))
dates = sorted(set([re.split(r'_(\d{8})_(\d+)\.(\w+)', f)[1] for f in files]))
dfs = {}
for d in dates:
dfs[d] = pd.concat((pd.read_csv(f) for f in glob.glob(fpattern.format(d, '*'))), ignore_index=True)
<强>测试强>
In [95]: dfs.keys()
Out[95]: dict_keys(['20140804', '20140805', '20140803', '20140806'])
In [96]: dfs['20140803']
Out[96]:
a b c
0 0 0 7
1 3 7 1
2 9 7 3
3 7 4 7
4 5 2 4
5 0 0 4
6 7 2 2
7 8 4 1
8 0 8 3
9 3 9 0
10 7 3 9
11 1 9 8
12 6 7 2
13 3 8 1
14 3 4 5
15 0 9 2
16 5 8 7
17 8 5 4
18 2 0 2
19 9 6 6
20 6 6 6
21 2 6 9
22 1 0 8
23 3 1 1
24 7 4 2
25 7 4 2
26 8 3 7
27 7 3 2
28 1 7 7
29 3 6 5
<强>设定:强>
fn = r'D:\temp\.data\41444939\a.txt'
base_dir = r'D:\temp\.data\41444939'
files = open(fn).read().splitlines()
for f in files:
pd.DataFrame(np.random.randint(0, 10, (5, 3)), columns=list('abc')) \
.to_csv(os.path.join(base_dir, f), index=False)