伙计们,我这里有200个单独的csv文件,从SH(1)到SH(200)命名。我想将它们合并到一个csv文件中。我该怎么办?
答案 0 :(得分:73)
正如ghostdog74所说,但这次是标题:
fout=open("out.csv","a")
# first file:
for line in open("sh1.csv"):
fout.write(line)
# now the rest:
for num in range(2,201):
f = open("sh"+str(num)+".csv")
f.next() # skip the header
for line in f:
fout.write(line)
f.close() # not really needed
fout.close()
答案 1 :(得分:42)
为什么你不能sed 1d sh*.csv > merged.csv
?
有时你甚至不必使用python!
答案 2 :(得分:33)
使用accepted StackOverflow answer创建要附加的csv文件列表,然后运行此代码:
import pandas as pd
combined_csv = pd.concat( [ pd.read_csv(f) for f in filenames ] )
如果要将其导出到单个csv文件,请使用:
combined_csv.to_csv( "combined_csv.csv", index=False )
答案 3 :(得分:16)
fout=open("out.csv","a")
for num in range(1,201):
for line in open("sh"+str(num)+".csv"):
fout.write(line)
fout.close()
答案 4 :(得分:11)
我将通过购物篮中的另一个代码示例
from glob import glob
with open('singleDataFile.csv', 'a') as singleFile:
for csvFile in glob('*.csv'):
for line in open(csvFile, 'r'):
singleFile.write(line)
答案 5 :(得分:10)
这取决于你所说的“合并” - 他们有相同的列吗?他们有标题吗?例如,如果它们都具有相同的列,并且没有标题,则简单的连接就足够了(打开目标文件进行写入,遍历打开每个读取的源,使用来自open-for-reading源的shutil.copyfileobj进入开放式写入目的地,关闭源,保持循环 - 使用with
语句代表您进行结束。如果它们具有相同的列,但也包含标题,则除了第一个文件之外,每个源文件都需要readline
,在将其复制到目标之前打开它进行读取后,跳过标题行。 / p>
如果CSV文件并非都具有相同的列,那么您需要定义在哪种意义上“合并”它们(如SQL JOIN?或“水平”,如果它们都具有相同的行数?等等) - 在这种情况下,我们很难猜出你的意思。
答案 6 :(得分:3)
对上面的代码略有改动,因为它实际上无法正常工作。
应该如下......
from glob import glob
with open('main.csv', 'a') as singleFile:
for csv in glob('*.csv'):
if csv == 'main.csv':
pass
else:
for line in open(csv, 'r'):
singleFile.write(line)
答案 7 :(得分:3)
如果合并的CSV将在Python中使用,那么只需使用glob
通过files
参数获取要传递给fileinput.input()
的文件列表,然后使用{ {3}}模块一次性读取所有内容。
答案 8 :(得分:2)
很容易组合目录中的所有文件并合并它们
import glob
import csv
# Open result file
with open('output.txt','wb') as fout:
wout = csv.writer(fout,delimiter=',')
interesting_files = glob.glob("*.csv")
h = True
for filename in interesting_files:
print 'Processing',filename
# Open and process file
with open(filename,'rb') as fin:
if h:
h = False
else:
fin.next()#skip header
for line in csv.reader(fin,delimiter=','):
wout.writerow(line)
答案 9 :(得分:2)
如果您使用的是linux / mac,则可以执行此操作。
from subprocess import call
script="cat *.csv>merge.csv"
call(script,shell=True)
答案 10 :(得分:1)
您可以导入csv然后循环浏览所有CSV文件并将其读入列表。然后将列表写回磁盘。
import csv
rows = []
for f in (file1, file2, ...):
reader = csv.reader(open("f", "rb"))
for row in reader:
rows.append(row)
writer = csv.writer(open("some.csv", "wb"))
writer.writerows("\n".join(rows))
上面不是很健壮,因为它没有错误处理,也没有关闭任何打开的文件。 无论各个文件中是否包含一行或多行CSV数据,这都应该有效。此外,我没有运行此代码,但它应该让您知道该怎么做。
答案 11 :(得分:1)
您可以简单地使用内置的csv
库。即使您的某些CSV文件的列名或标题略有不同,也可以使用此解决方案,这与其他获得最高评价的答案不同。
import csv
import glob
filenames = [i for i in glob.glob("SH*.csv")]
header_keys = []
merged_rows = []
for filename in filenames:
with open(filename) as f:
reader = csv.DictReader(f)
merged_rows.extend(list(reader))
header_keys.extend([key for key in reader.fieldnames if key not in header_keys])
with open("combined.csv", "w") as f:
w = csv.DictWriter(f, fieldnames=header_keys)
w.writeheader()
w.writerows(merged_rows)
合并的文件将包含在文件中可以找到的所有可能的列(header_keys
)。文件中任何不存在的列都将呈现为空白/空白(但保留文件的其余数据)。
注意:
csv
库,但是必须使用基本的DictReader
和{{1}来代替DictWriter
和reader
}。writer
列表)。答案 12 :(得分:0)
我通过实现一个期望输出文件和输入文件路径的函数来完成它。 该函数将第一个文件的文件内容复制到输出文件中,然后对其余输入文件执行相同操作,但没有标题行。
def concat_files_with_header(output_file, *paths):
for i, path in enumerate(paths):
with open(path) as input_file:
if i > 0:
next(input_file) # Skip header
output_file.writelines(input_file)
函数使用示例:
if __name__ == "__main__":
paths = [f"sh{i}.csv" for i in range(1, 201)]
with open("output.csv", "w") as output_file:
concat_files_with_header(output_file, *paths)
答案 13 :(得分:0)
import pandas as pd
import os
df = pd.read_csv("e:\\data science\\kaggle assign\\monthly sales\\Pandas-Data-Science-Tasks-master\\SalesAnalysis\\Sales_Data\\Sales_April_2019.csv")
files = [file for file in os.listdir("e:\\data science\\kaggle assign\\monthly sales\\Pandas-Data-Science-Tasks-master\\SalesAnalysis\\Sales_Data")
for file in files:
print(file)
all_data = pd.DataFrame()
for file in files:
df=pd.read_csv("e:\\data science\\kaggle assign\\monthly sales\\Pandas-Data-Science-Tasks-master\\SalesAnalysis\\Sales_Data\\"+file)
all_data = pd.concat([all_data,df])
all_data.head()
答案 14 :(得分:0)
在使@Adders及其后由@varun改进的解决方案上,我实现了一些小的改进,也使整个合并的CSV仅带有主标头:
from glob import glob
filename = 'main.csv'
with open(filename, 'a') as singleFile:
first_csv = True
for csv in glob('*.csv'):
if csv == filename:
pass
else:
header = True
for line in open(csv, 'r'):
if first_csv and header:
singleFile.write(line)
first_csv = False
header = False
elif header:
header = False
else:
singleFile.write(line)
singleFile.close()
最诚挚的问候!
答案 15 :(得分:0)
易于使用的功能:
def csv_merge(destination_path, *source_paths):
'''
Merges all csv files on source_paths to destination_path.
:param destination_path: Path of a single csv file, doesn't need to exist
:param source_paths: Paths of csv files to be merged into, needs to exist
:return: None
'''
with open(destination_path,"a") as dest_file:
with open(source_paths[0]) as src_file:
for src_line in src_file.read():
dest_file.write(src_line)
source_paths.pop(0)
for i in range(len(source_paths)):
with open(source_paths[i]) as src_file:
src_file.next()
for src_line in src_file:
dest_file.write(src_line)
答案 16 :(得分:0)
或者,你可以做
cat sh*.csv > merged.csv
答案 17 :(得分:0)
如果文件没有按顺序编号,请采用以下简便的方法: Windows计算机上的Python 3.6:
import pandas as pd
from glob import glob
interesting_files = glob("C:/temp/*.csv") # it grabs all the csv files from the directory you mention here
df_list = []
for filename in sorted(interesting_files):
df_list.append(pd.read_csv(filename))
full_df = pd.concat(df_list)
# save the final file in same/different directory:
full_df.to_csv("C:/temp/merged_pandas.csv", index=False)
答案 18 :(得分:0)
假设您有2个const data = JSON.parse(responseBody);
data.forEach(item => {
console.log(item.id); // deliver object id.
item.options.forEach(option => {
console.log(`Option Id ${option.id}`); // option id
postman.setEnvironmentVariable("service_id", option.id);
option.options(optionItem => {
if(optionItem.name == 'Toothbrush'){
postman.setEnvironmentVariable("svc_optn_optn_name", optionItem.name);
postman.setEnvironmentVariable("svc_optn_optn_id", optionItem.id);
}
});
});
});
文件,如下所示:
csv1.csv:
csv
csv2.csv:
id,name
1,Armin
2,Sven
,您希望结果像这样的csv3.csv:
id,place,year
1,Reykjavik,2017
2,Amsterdam,2018
3,Berlin,2019
然后,您可以使用以下代码段来做到这一点:
id,name,place,year
1,Armin,Reykjavik,2017
2,Sven,Amsterdam,2018
3,,Berlin,2019
借助循环,您可以针对多个文件(200个csv文件)获得相同的结果。
答案 19 :(得分:0)
更新wisty对python3的回答
fout=open("out.csv","a")
# first file:
for line in open("sh1.csv"):
fout.write(line)
# now the rest:
for num in range(2,201):
f = open("sh"+str(num)+".csv")
next(f) # skip the header
for line in f:
fout.write(line)
f.close() # not really needed
fout.close()
答案 20 :(得分:0)
这是一个脚本:
SH1.csv
的csv文件连接到SH200.csv
import glob
import re
# Looking for filenames like 'SH1.csv' ... 'SH200.csv'
pattern = re.compile("^SH([1-9]|[1-9][0-9]|1[0-9][0-9]|200).csv$")
file_parts = [name for name in glob.glob('*.csv') if pattern.match(name)]
with open("file_merged.csv","wb") as file_merged:
for (i, name) in enumerate(file_parts):
with open(name, "rb") as file_part:
if i != 0:
next(file_part) # skip headers if not first file
file_merged.write(file_part.read())
答案 21 :(得分:0)
我修改了@wisty所说的使用python 3.x的内容,对于那些有编码问题的人,我也使用os模块来避免硬编码
import os
def merge_all():
dir = os.chdir('C:\python\data\\')
fout = open("merged_files.csv", "ab")
# first file:
for line in open("file_1.csv",'rb'):
fout.write(line)
# now the rest:
list = os.listdir(dir)
number_files = len(list)
for num in range(2, number_files):
f = open("file_" + str(num) + ".csv", 'rb')
f.__next__() # skip the header
for line in f:
fout.write(line)
f.close() # not really needed
fout.close()