显示所有数据

Question

首先，我的代码中可能存在比我所知更多的错误，我是新手的事实我根本不了解所有内容。我试图索引我的表如下图：我正在逐步读取列a和b并将它们附加在一起，总共我读了500个文件，每行15000行。现在我需要MultiIndex它们，如下图所示，但我无法在循环中找到一种方法，使用pandas层次索引和MultIndex。有没有办法用所有数据点和数字的循环来做？

enter image description here

all_data = pd.DataFrame()

for f in glob.glob("path_in_dir"):
    df = pd.read_table(f, delim_whitespace=True, 
                   names=('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'),
                   dtype={'A': np.float32, 'B': np.float32, 'C': np.float32,
                          'D': np.float32,'E': np.float32, 'F': np.float32,
                          'G': np.float32,'H': np.float32})

    all_data = all_data.append(df,ignore_index=True)

all_data.index.names = ['numbers']

显示所有数据

print(all_data)

我正在使用append，但我读到的地方也不像pd.concat那样效率很高，这对于提高速度和减少内存使用量非常重要。当我以这种方式尝试时：all_data = pd.concat(df,ignore_index=True)我收到错误：

第一个参数必须是pandas对象的可迭代，你传递了一个“DataFrame”类型的对象

目前我只获得d列，但是从0开始计数到行的末尾，因此对于2个文件直到30000.所以我没有将计数拆分到每个文件数据点。

当我将索引扩展为：`all_data.index.names = [datapoints，numbers] 获取消息ValueError：新名称的长度必须为1，得到2

Answer 1

尝试这样的事情。请注意，您不需要声明all_data，因为您可以在循环中执行此操作。字典部分也有助于创建您正在寻找的多索引。

# make a test txt file
txt = open('df1.txt', mode = 'w')
txt.write('1 2 3 4 5 6 7 8 \n2 4 6 8 10 12 14 16')
txt.close()

# make a dictionary for storing the dataframes
dataframes = {}

# import files with for-loop in my current working directory (otherwise a different path)
for file in enumerate(glob.glob(os.getcwd()+'/*.txt')): # using *.txt to only retrieve .txt files
    dataframes.update({file[0] + 1: pd.read_table(file[1], delim_whitespace = True, names = ('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'), dtype = {
        'A' : np.float32,
        'B' : np.float32,
        'C' : np.float32,
        'D' : np.float32,
        'E' : np.float32,
        'F' : np.float32,
        'G' : np.float32,
        'H' : np.float32
    })})

# concat dataframes together
df = pd.concat(dataframes, axis = 0)

# label indices to match wanted output
df.index.names = ['Datapoint', 'number']

df

带有Python循环的分层MultIndex表

显示所有数据

1 个答案: