python pandas动态创建多个数据框并将它们连接为一个

时间:2020-05-26 09:50:57

标签: python pandas

我正在尝试创建多个数据框,它们都位于不同的文件夹和.csv文件中。我遇到的问题是,当我尝试为数据创建字典时,python像这样解析我的数据框:

Printing Fruits Dictionairy
{'0':       Name      Date      Time  Open  High   Low  Close  Volume  VWAP  Trades
0   Orange  20200430  15:30:00  5.70  5.97  5.65   5.75    1000  5.60      55
1   Orange  20200430  17:00:00  5.65  5.95  5.50   5.80    1200  5.65      68
2   Orange  20200430  20:00:00  5.50  5.83  5.44   5.60    1300  5.73      71
3   Orange  20200430  22:00:00  5.55  5.58  5.35   5.57    1400  5.78      81
4   Orange  20200501  15:30:00  5.50  5.85  5.45   5.70    1500  5.73      95
5   Orange  20200501  17:00:00  5.65  5.70  5.50   5.60    1600  5.65      54
6   Orange  20200501  20:00:00  5.80  5.85  5.45   5.81    1700  5.73      41
7   Orange  20200501  22:00:00  5.60  5.84  5.45   5.65    1800  5.75      62
8   Orange  20200504  15:30:00  5.40  5.87  5.45   5.75    1900  5.83      84
9   Orange  20200504  17:00:00  5.50  5.75  5.40   5.60    2000  5.72      94
10  Orange  20200504  20:00:00  5.80  5.83  5.44   5.50    2100  5.40      55
11  Orange  20200504  22:00:00  5.40  5.58  5.37   5.80    2200  5.35      87, '1':      Name      Date      Time  Open  High   Low  Close  Volume  VWAP  Trades
0   Apple  20200504  10:00:00  3.70  3.97  3.65   3.75    1000  3.60      55
1   Apple  20200504  12:00:00  3.65  3.95  3.50   3.80    1200  3.65      68
2   Apple  20200504  14:00:00  3.50  3.83  3.44   3.60    1300  3.73      71
3   Apple  20200504  16:00:00  3.55  3.58  3.35   3.57    1400  3.78      81
4   Apple  20200505  10:00:00  3.50  3.85  3.45   3.70    1500  3.73      95
5   Apple  20200505  12:00:00  3.65  3.70  3.50   3.60    1600  3.65      54
6   Apple  20200505  14:00:00  3.80  3.85  3.45   3.81    1700  3.73      41
7   Apple  20200505  16:00:00  3.60  3.84  3.45   3.65    1800  3.75      62
8   Apple  20200506  10:00:00  3.40  3.87  3.45   3.75    1900  3.83      84
9   Apple  20200506  12:00:00  3.50  3.75  3.40   3.60    2000  3.72      94
10  Apple  20200506  14:00:00  3.80  3.83  3.44   3.50    2100  3.40      55
11  Apple  20200506  16:00:00  3.40  3.58  3.37   3.80    2200  3.35      87}

因此,每个数据帧的所有数据都存储在一个“单元”(行和列)中,而不是像这样解析

df(idx) = {{Name: Orange, Orange, Orange}, {Date: 202004320, 20200430, 20200430}, {Time: 15:30:00, 17:00:00, 20:00:00}}

df(idx) = {{Name: Apple, Apple, Apple}, {Date: 202004320, 20200430, 20200430}, {Time: 15:30:00, 17:00:00, 20:00:00}}

我创建了一个示例文件来说明我的问题,这也是从中获取示例的地方。

import pandas as pd
import os

#Opening 'Test Tracker.xlsx' to find entities to download
TEST = pd.ExcelFile("Trackers\TEST Tracker.xlsx")
df1 = TEST.parse("Entries")

values1 = df1[['Name', 'Location', 'Date', 'TimeO', 'TimeC', 'Check_2',
           'Open', 'High', 'Low', 'Close', 'Volume', 'VWAP', '$Volume', 'Trades']]

#Searching for every row that contains the value 'X' in the column 'Check_2'
rdf1 = values1[values1.Check_2.str.contains("X")]

#Printing dataframe to check
print("First Dataframe")
print(rdf1)

#creating a dictionary for the dataframes
Fruits = {}

#Generating dynamic dataframes for each row in rdf1
for idx, rows in rdf1.iterrows():
    fle = os.path.join('Entities', rows.Location, rows.Name, 'TwoHours.csv')
    col_list = ['Name', 'Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume', 'VWAP', 'Trades']
    df3 = pd.read_csv(fle, usecols=col_list, sep=";")
    Fruits['' + str(idx)] = df3[col_list]

print("Printing Fruits Dictionairy")
print(Fruits)

从中检索信息的第一个数据框如下所示:

First Dataframe
 Name  Location      Date     TimeO  ... Volume VWAP  $Volume  Trades
0  Orange  New York  20200501  15:30:00  ...    NaN  NaN      NaN     NaN
1   Apple     Minsk  20200505  15:30:00  ...    NaN  NaN      NaN     NaN

以防有人怀疑。

我希望这里有人可以帮助我,因为我已经为此苦苦挣扎了很长时间,并尝试了许多都失败了的事情。

0 个答案:

没有答案