Question

我在目录中有n个文件需要合并为一个。它们具有相同数量的列，例如，test1.csv的内容为：

test1,test1,test1  
test1,test1,test1  
test1,test1,test1

同样，test2.csv的内容是：

test2,test2,test2  
test2,test2,test2  
test2,test2,test2

我希望final.csv看起来像这样：

test1,test1,test1  
test1,test1,test1  
test1,test1,test1  
test2,test2,test2  
test2,test2,test2  
test2,test2,test2

但它反而出现了：

test file 1,test file 1.1,test file 1.2,test file 2,test file 2.1,test file 2.2  
,,,test file 2,test file 2,test file 2  
,,,test file 2,test file 2,test file 2  
test file 1,test file 1,test file 1,,,  
test file 1,test file 1,test file 1,,,

有人可以帮我弄清楚这里发生了什么吗？我在下面粘贴了我的代码：

import csv
import glob
import pandas as pd
import numpy as np 

all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files

for f in glob.glob("*.csv"): #for all csv files in pwd
    df = pd.read_csv(f) #create dataframe for reading current csv
    all_data = all_data.append(df) #appends current csv to final DF

all_data.to_csv("final.csv", index=None)

Answer 1

我认为还有更多问题：

我删除了import csv和import numpy as np，因为在这个演示中它们没有被使用（但也许它们在缺失的行中，因此可以导入它们）
我创建了所有数据框dfs的列表，其中数据框由dfs.append(df)附加。然后我使用函数concat将此列表加入到最终数据帧。
在功能read_csv中，我添加了参数header=None，因为主要问题是read_csv将第一行读为header。
在函数to_csv中，我添加了参数header=None以省略标头。
我将文件夹test添加到最终目标文件，因为如果使用函数glob.glob("*.csv")，则应将输出文件作为输入文件读取。

解决方案：

import glob
import pandas as pd

all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files

#list of all df
dfs = []
for f in glob.glob("*.csv"): #for all csv files in pwd
    #add parameters to read_csv
    df = pd.read_csv(f, header=None) #create dataframe for reading current csv
    #print df
    dfs.append(df) #appends current csv to final DF
all_data = pd.concat(dfs, ignore_index=True)
print all_data
#       0      1      2
#0  test1  test1  test1
#1  test1  test1  test1
#2  test1  test1  test1
#3  test2  test2  test2
#4  test2  test2  test2
#5  test2  test2  test2
all_data.to_csv("test/final.csv", index=None, header=None)

下一个解决方案是类似的我将参数header=None添加到read_csv和to_csv，并将参数ignore_index=True添加到append。

import glob
import pandas as pd

all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files

for f in glob.glob("*.csv"): #for all csv files in pwd
    df = pd.read_csv(f, header=None) #create dataframe for reading current csv
    all_data = all_data.append(df, ignore_index=True) #appends current csv to final DF
print all_data
#       0      1      2
#0  test1  test1  test1
#1  test1  test1  test1
#2  test1  test1  test1
#3  test2  test2  test2
#4  test2  test2  test2
#5  test2  test2  test2

all_data.to_csv("test/final.csv", index=None, header=None)

Answer 2

你可以var width = 500; var height = 500; var aProjection = d3.geo.mercator() .scale(80)//80 works well in this case .translate([width / 2, height / 2]); var geoPath = d3.geo.path().projection(aProjection);//d3.geo.path() defaults to albersUSA, which is a projection suitable only for maps of the United States。让concat成为您的第一个数据帧，df1成为第二个数据帧，您可以：

df2

df = pd.concat([df1,df2],ignore_index=True)是可选的，如果您不介意单个数据帧的原始索引，可以将其设置为ignore_index。

Answer 3

当您想要创建单个csv文件时，

pd.merge(df1, df2, how='inner', on=['id'])不是一个可以使用的工具，您只需将每个csv写入新文件：

pandas

如果您愿意，可以使用csv lib：

import glob

with open("out.csv","w") as out:
    for fle in glob.glob("*.csv"):
        with open(fle) as f:
             out.writelines(f)

创建一个大型数据帧只是为了最终写入磁盘没有任何意义，而且如果你有很多大文件，甚至可能都不可能。

在python中使用pandas将csv文件附加到一个

3 个答案: