Question

我正在尝试连接目录中所有CSV文件中的某些特定列。我能够完成所有这些操作，并生成一个CSV文件。问题是因为我不知道哪些列属于哪个CSV文件，所以我希望将每列的标题作为它来自的CSV文件。

例如。

CSVFile1: 
Col1|Col2

CSVFile2: 
Col1|Col2

CSVMergeFile:
CSVFile1|CSVFile2|CSVFile1|CSVFile2
Col1    |Col1    |Col2    |Col2

以下是我用来连接各列的代码：

import pandas as pd
import glob
p = input("Enter folder path :")
n = int(input("Enter number of columns: "))
col = []
for i in range(0, n):
    ele = int(input())
    col.append(ele)
path = f'{p}'
all_files = glob.glob(path + "/*.csv")
li = []    
for filename in all_files:
    df = pd.read_csv(filename, usecols=col, index_col=False, header=0)
    li.append(df)    
frame = pd.concat(li, axis=1, ignore_index='False')

有什么建议吗？

Answer 1

IIUC，您可以创建subDiv（即csv文件名）并将其合并到dict理解中。这将创建带有文件名的索引。

key

from pathlib import Path import pandas as pd p = input("Enter folder path :") #change .stem to .name if you want the `.csv` appendage. dfs = {file.stem : pd.read_csv(file, usecols=col, header=0) ) for file in Path(p).glob('*.csv')} df = pd.concat(dfs)将是文件名的字典，密钥为数据框。

如何使用熊猫将CSV文件名添加为数据框中的列标题？

1 个答案: