Question

我试图从包含12个.txt文件的目录中提取数据。每个文件包含3列我想要提取的数据（X，Y，Z）。我想收集一个df（InforDF）中的所有数据，但到目前为止我只成功创建了一个df，其中包含同一列中的所有X，Y和Z数据。这是我的代码：

import pandas as pd
import numpy as np
import os
import fnmatch

path = os.getcwd()

file_list = os.listdir(path)

InfoDF = pd.DataFrame()

for file in file_list:
    try:
        if fnmatch.fnmatch(file, '*.txt'):
            filedata = open(file, 'r')
            df = pd.read_table(filedata, delim_whitespace=True, names={'X','Y','Z'})

    except Exception as e:
        print(e)

我做错了什么？

Answer 1

df = pd.read_table(filedata, delim_whitespace=True, names={'X','Y','Z'})

此行在循环的每次迭代中替换df，这就是为什么在程序结束时只有最后一行的原因。

您可以做的是将所有数据框保存在列表中并在最后将它们连接起来

df_list = []
for file in file_list:
    try:
        if fnmatch.fnmatch(file, '*.txt'): 
            filedata = open(file, 'r')
            df_list.append(pd.read_table(filedata, delim_whitespace=True, names={'X','Y','Z'}))
df = pd.concat(df_list)

或者，你可以写下来：

df_list = pd.concat([pd.read_table(open(file, 'r'), delim_whitespace=True, names={'X','Y','Z'})  for file in file_list if fnmatch.fnmatch(file, '*.txt')])

Answer 2

我认为您需要glob来选择所有文件，在DataFrames中创建dfs list comprehension的列表，然后使用concat：

files = glob.glob('*.txt')
dfs = [pd.read_csv(fp, delim_whitespace=True, names=['X','Y','Z']) for fp in files]

df = pd.concat(dfs, ignore_index=True)

Answer 3

正如上面提到的camilleri，你正在覆盖你的循环中的df
同样没有必要抓住一般例外

解决方案：在循环之前创建一个空数据框InfoDF，然后使用append或concat填充较小的df s < / p>

import pandas as pd
import numpy as np
import os
import fnmatch

path = os.getcwd()

file_list = os.listdir(path)

InfoDF = pd.DataFrame(columns={'X','Y','Z'}) # create empty dataframe
for file in file_list:
    if fnmatch.fnmatch(file, '*.txt'): 
        filedata = open(file, 'r')
        df = pd.read_table(filedata, delim_whitespace=True, names={'X','Y','Z'})
        InfoDF.append(df, ignore_index=True)
print InfoDF

使用python从多个文件中提取数据

3 个答案: