将CSV文件与一列中的不同列数据合并

时间:2019-07-09 14:52:41

标签: python pandas csv dataframe

样本数据集的结构如下

  • Home_HeatSensor_AA.CSV
  • Office_HeatSensor_BB.CSV
  • Ship_ElevationSensor_XXYY.CSV

AA.CSV具有以下列,并带有示例行

   Time  AA  AB  BB  Site  Type
0  1:00   5   4   5  Home  Heat

BB.CSV的格式类似

   Time  AA  AB  BB    Site  Type
0  1:00   6   2   4  Office  Heat

但是,XXYY.CSV的格式却大不相同

   Time     XX       XY     YY  Site       Type
0  1:00  1.332  12.1123  4.212  Ship  Elevation

我需要将这三个CSV文件加入一个格式如下的主CSV文件中

   Time AA AB AB     XX       XY     YY    Site       Type
0  1:00  5  4  4                           Home       Heat
0  1:00  6  2  2                         Office       Heat
0  1:00           1.332  12.1123  4.212    Ship  Elevation

我尝试过与熊猫混为一谈,但结果好坏参半。下面的代码将合并数据,但会切换时间,站点和单位的列顺序。理想情况下,我希望这两个保持不变,时间在订单的最前面,而Site和Unit保持最后两个列的值

for filename in filepaths:
 df = pd.read_csv(filename, index_col=None, header=0, parse_dates=True,infer_datetime_format=True)
 li.append(df)

1 个答案:

答案 0 :(得分:2)

pd.concat

def read_csv(fn):
    return pd.read_csv(fn, skipinitialspace=True)

files = ['Home_HeatSensor_AA.CSV', 'BB.CSV', 'XXYY.CSV']
cols = ['Time', 'AA', 'AB', 'BB', 'XX', 'XY', 'YY', 'Site', 'Type']

pd.concat(map(read_csv, files), sort=False)[cols].to_csv('MASTER.CSV', index=False)

然后确认

cat MASTER.CSV

Time,AA,AB,BB,XX,XY,YY,Site,Type
1:00,5.0,4.0,5.0,,,,Home,Heat
1:00,6.0,2.0,4.0,,,,Office,Heat
1:00,,,,1.3319999999999999,12.1123,4.212,Ship,Elevation

如果您不知道高级列名称:

def read_csv(fn):
    return pd.read_csv(fn, skipinitialspace=True)

files = ['Home_HeatSensor_AA.CSV', 'BB.CSV', 'XXYY.CSV']

pd.concat(map(read_csv, files), sort=False).to_csv('MASTER.CSV', index=False)