我有以下DF
。
DF数量
| hold_date | day_count | qty | item | ccy |
+------------+-----------+------+----------+-----+
| 2015-01-01 | 1 | 1200 | CB04 box | USD |
| 2015-01-01 | 3 | 1500 | AB01 box | USD |
| 2015-01-02 | 2 | 550 | CB03 box | USD |
我想基于hold_date
来增加day_count
。例如item
:AB01 box
将添加两个新行,如下所示。因此df
可能看起来像这样。
DF数量
| hold_date | qty | item | ccy |
+------------+------+----------+-----+
| 2015-01-01 | 1200 | CB04 box | USD |
| 2015-01-01 | 1500 | AB01 box | USD |
| 2015-01-02 | 1500 | AB01 box | USD |
| 2015-01-03 | 1500 | AB01 box | USD |
| 2015-01-02 | 550 | CB03 box | USD |
| 2015-01-03 | 550 | CB03 box | USD |
答案 0 :(得分:2)
需要:
s=df.day_count
s1=[pd.Timedelta(x,'D') for x in sum(df.day_count.apply(lambda x : list(range(x))),[])]
df_new=df.reindex(df.index.repeat(s))
df_new['hold_date']=df_new.hold_date+s1
df_new
Out[642]:
hold_date day_count qty item ccy
0 2015-01-01 1 1200 CB04box USD
1 2015-01-01 3 1500 AB01box USD
1 2015-01-02 3 1500 AB01box USD
1 2015-01-03 3 1500 AB01box USD
2 2015-01-02 2 550 CB03box USD
2 2015-01-03 2 550 CB03box USD
答案 1 :(得分:1)
这是一个完全矢量化(无for
循环)的解决方案。这个想法是创建一个包含所有日期列表的临时列,然后将其扩展为行。 expand_column
函数基于this answer。
df = pd.DataFrame([['2015-01-01', 1, 1200, 'CB04 box', 'USD'],
['2015-01-01', 3, 1500, 'AB01 box', 'USD'],
['2015-01-02', 2, 550, 'CB03 box', 'USD'],
], columns=['hold_date', 'day_count', 'qty', 'item', 'ccy'])
range_col = lambda row: list(pd.date_range(start=pd.to_datetime(row.hold_date), periods=row.day_count))
df = df.assign(hold_date=df.apply(range_col, axis=1))
expand_column(df, 'hold_date')[['hold_date', 'qty', 'item', 'ccy']]
hold_date qty item ccy
0 2015-01-01 1200 CB04 box USD
1 2015-01-01 1500 AB01 box USD
1 2015-01-02 1500 AB01 box USD
1 2015-01-03 1500 AB01 box USD
2 2015-01-02 550 CB03 box USD
2 2015-01-03 550 CB03 box USD
def expand_column(dataframe, column):
"""Transform iterable column values into multiple rows.
Source: https://stackoverflow.com/a/27266225/304209.
Args:
dataframe: DataFrame to process.
column: name of the column to expand.
Returns:
copy of the DataFrame with the following updates:
* for rows where column contains only 1 value, keep them as is.
* for rows where column contains a list of values, transform them
into multiple rows, each of which contains one value from the list in column.
"""
tmp_df = dataframe.apply(
lambda row: pd.Series(row[column]), axis=1).stack().reset_index(level=1, drop=True)
tmp_df.name = column
return dataframe.drop(column, axis=1).join(tmp_df)
答案 2 :(得分:0)
您可以通过从DF数量创建新的DataFrame
并重复元素数量*次来做到这一点:
df_qty = pd.DataFrame([df_qty.ix[idx]
for idx in df_qty.index
for _ in range(df_qty.ix[idx]['qty'])]).reset_index(drop=True)
这将创建一个包含foreach行的新列表,其中qty
*重复。
答案 3 :(得分:0)
这很丑,但无论如何还是把它留在这里:)
df = pd.concat(pd.DataFrame([df.loc[i]]*df.loc[i]['day_count'])
.assign(hold_date= pd.date_range(
df.loc[i]['hold_date'],
periods=df.loc[i]['day_count'],
freq='D'))
for i in range(len(df)))
完整示例:
import pandas as pd
df = pd.DataFrame({
'hold_date': pd.date_range('2015-01-01', '2015-01-02'),
'day_count': [2,3],
'qty': [1200,1500]
})
df = pd.concat(pd.DataFrame([df.loc[i]]*df.loc[i]['day_count'])
.assign(hold_date= pd.date_range(
df.loc[i]['hold_date'],
periods=df.loc[i]['day_count'],
freq='D'))
for i in range(len(df)))
print(df)
返回:
day_count hold_date qty
0 2 2015-01-01 1200
0 2 2015-01-02 1200
1 3 2015-01-02 1500
1 3 2015-01-03 1500
1 3 2015-01-04 1500