如何在Python中读取Pandas单独数据帧的多个文件

时间:2019-02-19 12:52:48

标签: python-3.x pandas csv

我正在尝试将6个文件读入7个不同的数据帧中,但无法弄清楚该怎么做。文件名可以是完全随机的,也就是我知道文件,但它不像data1.csv data2.csv。

我尝试使用类似这样的东西:

import sys
import os
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
f1='Norway.csv'
f='Canada.csv'
f='Chile.csv'

Norway = pd.read_csv(Norway.csv)
Canada = pd.read_csv(Canada.csv)
Chile = pd.read_csv(Chile.csv )

我需要读取不同数据帧中的多个文件。当我处理一个文件时,它工作正常

file='Norway.csv
Norway = pd.read_csv(file)

我收到错误消息:

NameError: name 'norway' is not defined

2 个答案:

答案 0 :(得分:1)

您可以将所有.csv文件读取到一个数据框中。

for file_ in all_files:
    df = pd.read_csv(file_,index_col=None, header=0)
    list_.append(df)

# concatenate all dfs into one
big_df = pd.concat(dfs, ignore_index=True)

,然后将大数据帧拆分为多个(在您的情况下为7)。例如,-

import numpy as np
num_chunks = 3  
df1,df2,df3 = np.array_split(big_df,num_chunks)

希望这会有所帮助。

答案 1 :(得分:0)

谷歌搜索了一段时间后,我决定将不同问题的答案合并为该问题的解决方案。此解决方案不适用于所有可能的情况。您必须对其进行调整以满足所有情况。

签出解决方案to this question

 # import libraries
import pandas as pd
import numpy as np
import glob
import os
# Declare a function for extracting a string between two characters
def find_between( s, first, last ):
    try:
        start = s.index( first ) + len( first )
        end = s.index( last, start )
        return s[start:end]
    except ValueError:
        return ""
path = '/path/to/folder/containing/your/data/sets' # use your path
all_files = glob.glob(path + "/*.csv")
list_of_dfs = [pd.read_csv(filename, encoding = "ISO-8859-1") for filename in all_files]
list_of_filenames = [find_between(filename, 'sets/', '.csv') for filename in all_files] # sets is the last word in your path
# Create a dictionary with table names as the keys and data frames as the values
dfnames_and_dfvalues = dict(zip(list_of_filenames, list_of_dfs))