用时间戳减去熊猫数据框中的列

时间:2019-08-12 12:35:37

标签: python pandas dataframe

我是Python的新手,并且继承了代码,该代码根据来自源的特定ID加载数据,然后将相关数据添加到DataFrame的列中。数据是带有时间戳的每月数据。我想从同一数据框中的另一列中减去数据框中的一列,但会收到错误消息。

在代码中,我有一个load_and_get_levels()函数,该函数首先调用一个函数make_df(),该函数加载我需要的数据并将其放入DataFrame中。此函数再次调用load_data()函数,我认为该函数返回一个由两列组成的DataFrame对象(一个datetamp列和一个包含实际数据的列)。

import pandas as pd

time_series_names = [
'Prod1',
'Prod2',
'Prod3',
'Prod4',
'Prod5',
'Prod6',
'Prod7',
]

def make_df(time_series_names):
    # Make a data frame with the first time series
    time_series_df = pd.DataFrame(load_data(time_series_names[0],
        start_time, end_time, time_zone))
    # Add remaining time series to the data frame.
    for ts_name_name in time_series_names[1:]:
        additional_ts = load_data(ts_name, start_time,
            end_time, time_zone)
        time_series_df = pd.merge(time_series_df,
            pd.DataFrame(additional_ts),
            left_index=True, right_index=True, how='outer')
    return time_series_df

def load_data(ts_name, start_time, end_time, time_zone='Europe/Oslo'):
    # get_ts_range() returns time series data with a certain time range
    ts_json = get_ts_range(ts_name,
        start_time.timestamp()*1000,
        end_time.timestamp()*1000)
    ts_points = ts_json['points']
    time_series_df = pd.DataFrame(ts_points,
        index=pd.to_datetime([point[0] for point in ts_points],
        unit='ms').tz_localize('UTC').tz_convert(time_zone),
        columns=['time', 'value'])
    series = time_series_df['value']
    series.name = ts_json['name']
    return series

def levels(time_series_df):
    # Estimates Prod1 - Prod7
    ts_levels = time_series_df['Prod1'] - time_series_df['Prod7']
    return ts_levels

def load_and_get_levels():
    ts_dframe = make_df(time_series_names)
    ts_df_levels = levels(ts_dframe)
    print(ts_df_levels)

最后我要减去在make_df()中创建的数据框中的两列。当我尝试这样做时:

ts_levels = time_series_df['Prod1'] - time_series_df['Prod7']

...我得到一个KeyError

File "C:..../xxx.py", line 74, in levels
    ts_levels = time_series_df['Prod1'] - time_series_df['Prod7']

  File
"C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py",
line 2927, in __getitem__
    indexer = self.columns.get_loc(key)

  File
"C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py",
line 2659, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))

  File "pandas/_libs/index.pyx", line 108, in
pandas._libs.index.IndexEngine.get_loc

  File "pandas/_libs/index.pyx", line 132, in
pandas._libs.index.IndexEngine.get_loc

  File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in
pandas._libs.hashtable.PyObjectHashTable.get_item

  File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in
pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'Prod1'

为什么不使用两列的名称减去两列呢?

0 个答案:

没有答案