我有一个文件夹,其中包含5个与各种网站相关的文本文件 -
标题的格式是这样的:
Rockspring_18_SW.417712.WRFc36.ET.2000-2050.txt
Rockspring_18_SW.417712.WRFc36.RAIN.2000-2050.txt
WICA.399347.WRFc36.ET.2000-2050.txt
WICA.399347.WRFc36.RAIN.2000-2050.txt
所以,基本上文件名的格式是 - (网站名称)。(网站编号)。(WRFc36)。(某些变量)。(2000-2050.txt
每个文本文件都有类似的格式,没有标题行:年月日值(每个文本文件中包含~18500行)
我希望Python搜索类似的文件名(站点名称和站点编号匹配),并从其中一个文件中选择第一到第三列数据并将其粘贴到新的txt文件中。我还想复制并粘贴网站的每个变量的第4列(rain,et等),并在新文件中按特定顺序粘贴它们。
我知道如何使用csv模块(并为空间分隔符定义新的方言)从ALL文件中获取数据并打印到新的文本文件,但我不知道如何自动创建新文件对于每个站点名称/编号,并确保我的变量以正确的顺序绘制 -
我想要使用的输出是每个站点的一个文本文件(不是5),格式如下(年,月,日,变量1,变量2,变量3,变量4,变量5),大约18500行...
我确信我正在寻找一些非常简单的东西......这看起来很简陋......但是 - 任何帮助都会非常感激!
======
我已更新代码以反映以下评论
http://codepad.org/3mQEM75e
来自集合import defaultdict
导入glob
import csv
#Create dictionary of lists-- [A] = [Afilename1, Afilename2, Afilename3...]
# [B] = [Bfilename1, Bfilename2, Bfilename3...]
def get_site_files():
sites = defaultdict(list)
#to start, I have a bunch of files in this format ---
#"site name(unique)"."site num(unique)"."WRFc36"."Variable(5 for each site name)"."2000-2050"
for fname in glob.glob("*.txt"):
#split name at every instance of "."
parts = fname.split(".")
#check to make sure i only use the proper files-- having 6 parts to name and having WRFc36 as 3rd part
if len(parts)==6 and parts[2]=='WRFc36':
#Make sure site name is the full unique identifier, the first and second "parts"
sites[parts[0]+"."+parts[1]].append(fname)
return sites
#hardcode the variables for method 2, below
Var=["TAVE","RAIN","SMOIS_INST","ET","SFROFF"]
def main():
for site_name, files in get_site_files().iteritems():
print "Working on *****"+site_name+"*****"
####Method 1- I'd like to not hardcode in my variables (as in method 2), so I can use this script in other applications.
for filename in files:
reader = csv.reader(open(filename, "rb"))
WriteFile = csv.writer(open("XX_"+site_name+"_combined.txt","wb"))
for row in reader:
row = reader.next()
####Method 2 works (mostly), but skips a LOT of random lines of first file, and doesn't utilize the functionality built into my dictionary of lists...
## reader0 = csv.reader(open(site_name+".WRFc36."+Var[0]+".2000-2050.txt", "rb")) #I'd like to copy ALL columns from the first file
## reader1 = csv.reader(open(site_name+".WRFc36."+Var[1]+".2000-2050.txt", "rb")) # and just the fourth column from all the rest of the files
## reader2 = csv.reader(open(site_name+".WRFc36."+Var[2]+".2000-2050.txt", "rb")) # (the columns 1-3 are the same for all files)
## reader3 = csv.reader(open(site_name+".WRFc36."+Var[3]+".2000-2050.txt", "rb"))
## reader4 = csv.reader(open(site_name+".WRFc36."+Var[4]+".2000-2050.txt", "rb"))
## WriteFile = csv.writer(open("XX_"+site_name+"_COMBINED.txt", "wb")) #creates new command to write a text file
##
## for row in reader0:
## row = reader0.next()
## row1 = reader1.next()
## row2 = reader2.next()
## row3 = reader3.next()
## row4 = reader4.next()
## WriteFile.writerow(row + row1 + row2 + row3 + row4)
## print "***finished with site***"
if __name__=="__main__":
main()
答案 0 :(得分:2)
根据网站分组,这是一种更简单的迭代文件的方法。
from collections import defaultdict
import glob
def get_site_files():
sites = defaultdict(list)
for fname in glob.glob('*.txt'):
parts = fname.split('.')
if len(parts)==6 and parts[2]=='WRFc36':
sites[parts[0]].append(fname)
return sites
def main():
for site,files in get_site_files().iteritems():
# you need to better explain what you are trying to do here!
print site, files
if __name__=="__main__":
main()
我仍然不明白你的剪辑和粘贴专栏 - 你需要更清楚地解释你想要完成的事情。
答案 1 :(得分:1)
就获取文件名而言,我会使用以下内容:
import os
# Gets a list of all file names that end in .txt
# ON *nix
file_names = os.popen('ls *.txt').read().split('\n')
# ON Windows
file_names = os.popen('dir /b *.txt').read().split('\n')
然后,要获得通常以句点分隔的元素,请使用:
# For some file_name in file_names
file_name.split('.')
然后你可以进行比较并提取所需的列(使用open(file_name,'r')或你的CSV解析器)
Michael G。