Question

我在重新格式化数据框时遇到麻烦。

我的输入是按符号列排列的日值行（每个符号的值都有不同的日期）：

代码以生成输入

data = [("01-01-2010", 15, 10), ("02-01-2010", 16, 11), ("03-01-2010", 16.5, 10.5)]
labels = ["date", "AAPL", "AMZN"]
df_input = pd.DataFrame.from_records(data, columns=labels)

所需的输出是（月行，每个月都有新行）：

Needed output

代码以生成输出

data = [("01-01-2010","29-01-2010", "AAPL", 15, 20), ("01-01-2010","29-01-2010", "AMZN", 10, 15),("02-02-2010","30-02-2010", "AAPL", 20, 32)]
labels = ['bd start month', 'bd end month','stock', 'start_month_value', "end_month_value"]
df = pd.DataFrame.from_records(data, columns=labels)

含义（伪代码） 1.对于每行，仅采用非nan值来创建新的“行”（也许是字典，日期作为索引，[stock，value]作为值。 2.仅采用月初或月末的行。 3.将这些行写入新的datatframe。

我已经阅读了this和this之类的几篇文章，还有其他几篇。所有人都使用相同“类型”的数据框，只是重新采样，而我需要更改结构...

到目前为止我的代码

# creating the new index with business days
df1 =pd.DataFrame(range(10000), index = pd.date_range(df.iloc[0].name, periods=10000, freq='D'))
from pandas.tseries.offsets import CustomBusinessMonthBegin
from pandas.tseries.holiday import USFederalHolidayCalendar
bmth_us = CustomBusinessMonthBegin(calendar=USFederalHolidayCalendar())
df2 = df1.resample(bmth_us).mean()

# creating the new index interseting my old one (daily) with the monthly index
new_index = df.index.intersection(df2.index)

# selecting only the rows I want
df = df.loc[new_index]

# creating a dict that will be my new dataset
new_dict = collections.OrderedDict()
# iterating over the rows and adding to dictionary
for index, row in df.iterrows():
#     print index
    date = df.loc[index].name
    # values are the not none values
    values = df.loc[index][~df.loc[index].isnull().values]

    new_dict[date]=values


# from dict to list
data=[]
for key, values in new_dict.iteritems():
    for i in range(0, len(values)):
        date = key
        stock_name = str(values.index[i])
        stock_value = values.iloc[i]
        row = (key, stock_name, stock_value)
        data.append(row)

# from the list to df
labels = ['date','stock', 'value']
df = pd.DataFrame.from_records(data, columns=labels)
df.to_excel("migdal_format.xls")

Current output I get

一个大问题：

我只在一个月初获得股票的价值。我需要开始和结束，所以我可以计算出该月的股票收益。

一个较小的问题：

我确定这不是最干净，最快的代码：）

非常感谢！

Answer 1

所以我找到了一种方法。

遍历每一列
按月分组
获取当月的第一个和最后一个值

计算收益

df_migdal = pd.DataFrame（）对于df_input.columns [0：]中的col： stock_position = df_input.loc [：，col]

name = stock_position.name
name = re.sub('[^a-zA-Z]+', '', name)
name = name[0:-4]


stock_position=stock_position.groupby([pd.TimeGrouper('M')]).agg(['first', 'last'])

stock_position["name"] = name
stock_position["return"] = ((stock_position["last"] / stock_position["first"]) - 1) * 100
stock_position.dropna(inplace=True)
df_migdal=df_migdal.append(stock_position)
df_migdal=df_migdal.round(decimals=2)

我尝试了一种凉爽的方法，但是不知道如何处理我得到的多重索引...我需要每一列，以容纳两个子列，并通过一些lambda函数创建第三个子列。 / p>

df_input.groupby([pd.TimeGrouper('M')]).agg(['first', 'last'])

熊猫和股票：从每日价值（以列为单位）到每月价值（以行为单位）

1 个答案: