Question

我有两个数据文件a.csv和b.csv，可以从pastebin获取： http://pastebin.com/nzjXESYn
http://pastebin.com/PDV5Ah64

第一个文件a.csv有4列和一些注释：

# coating file for detector A/R
# column 1 is the angle of incidence (degrees)
# column 2 is the wavelength (microns)
# column 3 is the transmission probability
# column 4 is the reflection probability
14.2    531.0   0.0618  0.9382
14.2    532.0   0.07905 0.92095
14.2    533.0   0.09989 0.90011
14.2    534.0   0.12324 0.87676
14.2    535.0   0.14674 0.85326
14.2    536.0   0.16745 0.83255
14.2    537.0   0.1837  0.8163
#
# 171 lines, 5 comments, 166 data

第二个文件b.csv有两列，一列具有不同的行数：

# Version 2.0 - nm, norm@500 to 1, burrows+2006c91.21_T1350_g4.7_f100_solar
# Wavelength(nm)  Flambda(ergs/cm^s/s/nm)
300.0 1.53345164121e-32
300.1 1.53345164121e-32
300.2 1.53345164121e-32 

# total lines = 20003, comment lines = 2, data lines = 20001

现在，我想将这两个文件合并为第二列公共（两个文件中的波长应该相同）。

输出如下：

# coating file for detector A/R
# column 1 is the angle of incidence (degrees)
# column 2 is the wavelength (microns)
# column 3 is the transmission probability
# column 4 is the reflection probability
# Version 2.0 - nm, norm@500 to 1, burrows+2006c91.21_T1350_g4.7_f100_solar
# Wavelength(nm)  Flambda(ergs/cm^s/s/nm)
14.2    531.0   0.0618  0.9382  1.14325276212
14.2    532.0   0.07905 0.92095 1.14557732058

注意：评论也已合并在文件b.csv中，波长在行号= 2313.

我们怎么能在python中这样做？

我最初的尝试是：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Author    : Bhishan Poudel
# Date      : Jun 17, 2016


# Imports
from __future__ import print_function
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


# read in dataframes
#======================================================================
# read in a file
#
infile = 'a.csv'
colnames = ['angle', 'wave','trans','refl']
print('{} {} {} {}'.format('\nreading file : ', infile, '','' ))
df1 = pd.read_csv(infile,sep='\s+', header = None,skiprows = 0,
         comment='#',names=colnames,usecols=(0,1,2,3))

print('{} {} {} {}'.format('df.head \n', df1.head(),'',''))
#------------------------------------------------------------------


#======================================================================
# read in a file
#
infile = 'b.csv'
colnames = ['wave', 'flux']
print('{} {} {} {}'.format('\nreading file : ', infile, '','' ))
df2 = pd.read_csv(infile,sep='\s+', header = None,skiprows = 0,
         comment='#',names=colnames,usecols=(0,1))
print('{} {} {} {}'.format('df.head \n', df2.head(),'','\n'))
#----------------------------------------------------------------------


result = df1.append(df2, ignore_index=True)
print(result.head())
print("\n")

以下是一些有用的链接：
How to merge data frame with same column names
http://pandas.pydata.org/pandas-docs/stable/merging.html

Answer 1

如果要合并两个数据集，则应使用.merge()方法，而不是.append()。

result = pd.merge(df1,df2,on='wave')

前者加入两个数据帧（类似于SQL连接），而后者将两个数据帧叠加在一起。

在python中使用pandas合并具有相同“列名”和“不同行”的两个文件的一种方法

1 个答案: