样本数据集的结构如下
AA.CSV具有以下列,并带有示例行
Time AA AB BB Site Type
0 1:00 5 4 5 Home Heat
BB.CSV的格式类似
Time AA AB BB Site Type
0 1:00 6 2 4 Office Heat
但是,XXYY.CSV的格式却大不相同
Time XX XY YY Site Type
0 1:00 1.332 12.1123 4.212 Ship Elevation
我需要将这三个CSV文件加入一个格式如下的主CSV文件中
Time AA AB AB XX XY YY Site Type
0 1:00 5 4 4 Home Heat
0 1:00 6 2 2 Office Heat
0 1:00 1.332 12.1123 4.212 Ship Elevation
我尝试过与熊猫混为一谈,但结果好坏参半。下面的代码将合并数据,但会切换时间,站点和单位的列顺序。理想情况下,我希望这两个保持不变,时间在订单的最前面,而Site和Unit保持最后两个列的值
for filename in filepaths:
df = pd.read_csv(filename, index_col=None, header=0, parse_dates=True,infer_datetime_format=True)
li.append(df)
答案 0 :(得分:2)
pd.concat
def read_csv(fn):
return pd.read_csv(fn, skipinitialspace=True)
files = ['Home_HeatSensor_AA.CSV', 'BB.CSV', 'XXYY.CSV']
cols = ['Time', 'AA', 'AB', 'BB', 'XX', 'XY', 'YY', 'Site', 'Type']
pd.concat(map(read_csv, files), sort=False)[cols].to_csv('MASTER.CSV', index=False)
然后确认
cat MASTER.CSV
Time,AA,AB,BB,XX,XY,YY,Site,Type
1:00,5.0,4.0,5.0,,,,Home,Heat
1:00,6.0,2.0,4.0,,,,Office,Heat
1:00,,,,1.3319999999999999,12.1123,4.212,Ship,Elevation
如果您不知道高级列名称:
def read_csv(fn):
return pd.read_csv(fn, skipinitialspace=True)
files = ['Home_HeatSensor_AA.CSV', 'BB.CSV', 'XXYY.CSV']
pd.concat(map(read_csv, files), sort=False).to_csv('MASTER.CSV', index=False)