Question

我有数百个大型CSV文件，我想合并为一个。但是并非所有CSV文件都包含所有列。因此，我需要根据列名进行合并，而不是列位置。

在合并的CSV中，对于来自没有该单元格列的行的单元格，值应为空。

我无法使用pandas模块，因为它会让我内存耗尽。

是否有可以执行该操作的模块或一些简单的代码？

我在下面提供代码来生成2个csv文件。我想要的是以一种让我获得tempdf3.csv的方式合并tempdf1.csv和tempdf2.csv。

import pandas as pd

df1=pd.DataFrame([{"Location":"A","Temperature":20,"Weather":"Fair", "Wind":"", "Latitude":44},{"Location":"B","Temperature":"","Weather":"Bad","Wind":"","Latitude":42}])
df2=pd.DataFrame([{"Location":"C","Temperature":14,"Weather":"","Longitude":12, "Wind":44},{"Location":"D","Temperature":"","Weather":"","Wind":0,"Longitude":11}])
df1.to_csv("C:/tempdf1.csv")
df2.to_csv("C:/tempdf2.csv")

df3=pd.DataFrame([{"Location":"A","Longitude":"","Temperature":20,"Weather":"Fair", "Wind":"", "Latitude":44},{"Location":"B","Longitude":"","Temperature":"","Weather":"Bad","Wind":"","Latitude":42},{"Location":"C","Temperature":14,"Weather":"","Longitude":12, "Wind":44,"Latitude":""},{"Location":"D","Temperature":"","Weather":"","Wind":0,"Longitude":11, "Latitude":""}])
df3.to_csv("C:/tempdf3.csv")

Python：合并具有不同列子集的csv文件

0 个答案: