我想在一个文件中附加两个ASCII文件(例如F1_Jan_01.txt
和F1_jan_01.txt
,分别包含在目录d01
和d02
中)。实际上,我有两个目录,每个目录中都有文件(F1
,F2
,F3
),月和日(1到7),我想附加文件具有相同名称的名称位于两个不同的目录中。因此,我用Python编写了以下代码。
import pandas as pd
maindir1="/home/d01/"
maindir2="/home/d02/"
outdir="/home/final/"
pol=[ "F1","F2","F3" ]
month=["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
for iis,ipol in enumerate(pol):
for jjs,imonth in enumerate(month):
for kk in range(1,7,1):
df1 = pd.read_csv(maindir1+str(ipol)+"_"+str(imonth)+"_0"+str(kk)+".txt", sep="\t")
df2 = pd.read_csv(maindir2+str(ipol)+"_"+str(imonth)+"_0"+str(kk)+".txt", sep="\t")
df = pd.concat([ df1, df2 ], ignore_index=True)
df.to_csv(outdir+str(ipol)+"_"+str(imonth)+"_0"+str(kk)+".txt",sep="\t",index=False)
问题在于,在最终输出中,当它追加第二个文件时,不会写入其第一行。例如,第一个文件(在d01
中)具有100000行,第二个文件(在d02
中)50000。因此,在最终输出中,正确地写入前100000行,然后附加49000第二个文件的第一行除外。
我是否需要在代码中定义其他任何内容?
答案 0 :(得分:3)
在不使用Pandas的情况下,以下是等效代码。 (干编码,YMMV。)
maindir1 = "/home/d01/"
maindir2 = "/home/d02/"
outdir = "/home/final/"
pols = ["F1", "F2", "F3"]
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
for ipol in pols:
for imonth in months:
for kk in range(1, 7):
template_args = {"ipol": ipol, "imonth": imonth, "kk": kk}
filename = "{ipol}_{imonth}_0{kk}.txt".format(ipol=ipol, imonth=imonth, kk=kk)
out_name = os.path.join(outdir, filename)
in_names = [os.path.join(maindir1, filename), os.path.join(maindir2, filename)]
with open(out_name, "w") as out_file:
for in_name in in_names:
with open(in_name, "r") as in_file:
out_file.write(in_file.read())