我有数百个大型CSV文件,我想合并为一个。但是并非所有CSV文件都包含所有列。因此,我需要根据列名进行合并,而不是列位置。
在合并的CSV中,对于来自没有该单元格列的行的单元格,值应为空。
我无法使用pandas模块,因为它会让我内存耗尽。
是否有可以执行该操作的模块或一些简单的代码?
我在下面提供代码来生成2个csv文件。我想要的是以一种让我获得tempdf3.csv的方式合并tempdf1.csv和tempdf2.csv。
import pandas as pd
df1=pd.DataFrame([{"Location":"A","Temperature":20,"Weather":"Fair", "Wind":"", "Latitude":44},{"Location":"B","Temperature":"","Weather":"Bad","Wind":"","Latitude":42}])
df2=pd.DataFrame([{"Location":"C","Temperature":14,"Weather":"","Longitude":12, "Wind":44},{"Location":"D","Temperature":"","Weather":"","Wind":0,"Longitude":11}])
df1.to_csv("C:/tempdf1.csv")
df2.to_csv("C:/tempdf2.csv")
df3=pd.DataFrame([{"Location":"A","Longitude":"","Temperature":20,"Weather":"Fair", "Wind":"", "Latitude":44},{"Location":"B","Longitude":"","Temperature":"","Weather":"Bad","Wind":"","Latitude":42},{"Location":"C","Temperature":14,"Weather":"","Longitude":12, "Wind":44,"Latitude":""},{"Location":"D","Temperature":"","Weather":"","Wind":0,"Longitude":11, "Latitude":""}])
df3.to_csv("C:/tempdf3.csv")