读取多个CSV,每个列都有其CSV名称

时间:2020-02-11 00:06:31

标签: python pandas csv dataframe glob

总而言之,我需要每个列都作为csv文件的名称。

这是我到目前为止所做的:

path = r'C:\Users\dfgdfsgsfg\Untitled Folder\tickers' # use your path
all_files = glob.glob(os.path.join(path, "*.csv"))     # advisable to use os.path.join as this makes concatenation OS independent

df_from_each_file = (pd.read_csv(f , parse_dates=True, index_col="date") for f in all_files)

concat = pd.concat(df_from_each_file, axis=1)

df = concat['PriceUSD']

df.columns = [ ??????? ] #what do I put in here?

This what I get when I dont name the columns

2 个答案:

答案 0 :(得分:0)

我也尝试过这个,但并没有使我希望的结果安静

all_files = glob.glob(os.path.join(path, "*.csv"))     # advisable to use os.path.join as this makes concatenation OS independent

df_from_each_file = (pd.read_csv(f , parse_dates=True, index_col="date").assign(filename = f) for f in all_files)

concat = pd.concat(df_from_each_file, axis=1)

df = concat['PriceUSD']

df.columns = all_files[:-2]

df

RESULT

答案 1 :(得分:0)

如果您真的只对所有CSV文件中的单个列感兴趣,那么在解析csv时,只需将其修剪为所需的列即可:

def getPriceUSD(filename):
    """reads csv file then returns dataframe with just the column 'PriceUSD'
    with the filename as the column title"""
    data = pd.read_csv(f , parse_dates=True, index_col="date")
    data = data["PriceUSD"]
    data.columns = [filename]
    return data

然后将所有已解析和格式化的列连接在一起:

df = pd.concat(map(getPriceUSD, all_files), axis=1)

在询问之前,如果您不希望使用完整路径,请对列使用os.path.basename(filename)而不是filename