从字典创建熊猫数据框

时间:2021-06-25 18:36:51

标签: python pandas dataframe dictionary

我已经编写了以下代码的python:

# Data to be formatted into pandas data frame: PDYN, DST, BYIMF, BZIMF, W1, W2, W3, W4, W5, W6.

# Import the necessary modules
import numpy as np
import datetime as dtm
import pandas as pd
import spacepy
import spacepy.time as spt
import spacepy.omni as spo

# Initial time: 2003-10-29T06:00:00 (DST = -10nT)
# Final time: 2003-10-30T17:00:00 (DST = -97nT)

# Extract the data during time interval using spacepy

start_time = dtm.datetime(2003, 10, 29, 6)                  # Initial time
end_time = dtm.datetime(2003, 10, 30, 17)               # Final time
dt = dtm.timedelta(hours = 1)                               # Time delta
ticks = spt.tickrange(start_time, end_time, dt, 'UTC')      # Range for time ticks
time_data = spo.get_omni(ticks)                                 # Create data dictionary

# Create data frame using Pandas
d = {'Time Stamp': time_data['ticks'], 'PDYN': time_data['Pdyn'], 'DST': time_data['Dst'], 'BYIMF': time_data['ByIMF'], 'BZIMF': time_data['BzIMF'], 'W1': time_data['W'][0], 'W2': time_data['W'][1], 'W3': time_data['W'][2], 'W4': time_data['W'][3], 'W5': time_data['W'][4], 'W6': time_data['W'][5]}
df = pd.DataFrame(data = d)
df

我已经从 spacepy 中提取了数据,并且我正在尝试创建一个表示我需要的参数的 Pandas 数据框。如您所见,变量 time_data 是一个字典。当我去创建数据框时,我继续收到以下错误:

ValueError: arrays must all be same length.

在格式化字典时,键W对应一个数组,该数组由6个其他数组组成,分别对应参数W1-W6。对于这些,我试图索引字典中的 W 数组。有一个更好的方法吗?在尝试诊断问题时,我只是想看看它是否会为至少一个参数生成一个数据框。有了这个,我有

# Import the necessary modules
import numpy as np
import datetime as dtm
import pandas as pd
import spacepy
import spacepy.time as spt
import spacepy.omni as spo

# Initial time: 2003-10-29T06:00:00 (DST = -10nT)
# Final time: 2003-10-30T17:00:00 (DST = -97nT)

# Extract the data during time interval using spacepy

start_time = dtm.datetime(2003, 10, 29, 6)                  # Initial time
end_time = dtm.datetime(2003, 10, 30, 17)               # Final time
dt = dtm.timedelta(hours = 1)                               # Time delta
ticks = spt.tickrange(start_time, end_time, dt, 'UTC')      # Range for time ticks
time_data = spo.get_omni(ticks)                                 # Create data dictionary

# Create data frame using Pandas
d = {'Time Stamp': time_data['ticks']}
df = pd.DataFrame(data = d)
df

由此我得到两个错误:

ValueError: maximum supported dimension for an ndarray is 32, found 33

ValueError: Shape of passed values is (1, 36, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), indices imply (36, 1)

有没有人有解决办法?谢谢。

1 个答案:

答案 0 :(得分:0)

我发现解决方案是在索引分配给 W 键的值时传递一个包含切片的元组。这是必要的,因为分配给 W 键的每个值在每个时间戳都是一个数组。这些数组由 6 个元素组成,每个元素对应一些参数 W1-W6。为了从与这些参数对应的数组中提取值,必须进行切片。这是一些有效的代码。

# Data to be formatted into pandas data frame: PDYN, DST, BYIMF, BZIMF, W1, W2, W3, W4, W5, W6.

# Import the necessary modules
import numpy as np
import datetime as dtm
import pandas as pd
import json
import spacepy
import spacepy.time as spt
import spacepy.omni as spo
#import ipdb;ipdb.set_trace()

# Initial time: 2003-10-29T06:00:00 (DST = -10nT)
# Final time: 2003-10-30T17:00:00 (DST = -97nT)

# Extract the data during time interval using spacepy
start_time = dtm.datetime(2003, 10, 29, 6)                      # Initial time
end_time = dtm.datetime(2003, 10, 30, 17)                       # Final time
dt = dtm.timedelta(hours = 1)                                   # Time delta
ticks = spt.tickrange(start_time, end_time, dt, 'UTC')          # Range for time ticks
time_data = spo.get_omni(ticks)                                 # Create data dictionary

# Equate the index with the hour
datetime_series = pd.Series(pd.date_range(start = start_time, end = end_time, freq = 'h'))
# Create time dependent data frame using Pandas
d ={'Date, Time': datetime_series, 'PDYN': time_data['Pdyn'], 'DST': time_data['Dst'],'BYIMF': time_data['ByIMF'], 'BZIMF': time_data['BzIMF'], 'W1': time_data['W'][:,0], 'W2': time_data['W'][:,1], 'W3': time_data['W'][:,2], 'W4': time_data['W'][:,3], 'W5': time_data['W'][:,4], 'W6': time_data['W'][:,5]}
df = pd.DataFrame(d)
print(df)