我正在寻找一种根据值条件插入重复行的方法。
输入数据集包含以周为单位的客户价格和价格有效期-'price_start_week'和'price_end_week'。
想法是通过添加带有实际星期数的新列来扩展数据框,并根据有效星期数重复行。
输入:
╔═══════════════╦══════════════════╦════════════════╦═════════════╗
║ customer_name ║ price_start_week ║ price_end_week ║ price_value ║
╠═══════════════╬══════════════════╬════════════════╬═════════════╣
║ A ║ 4 ║ 7 ║ 500 ║
║ B ║ 3 ║ 6 ║ 600 ║
║ C ║ 2 ║ 4 ║ 700 ║
╚═══════════════╩══════════════════╩════════════════╩═════════════╝
输出:
+---------------+------------------+----------------+-------------+-------------+
| customer_name | price_start_week | price_end_week | actual week | price_value |
+---------------+------------------+----------------+-------------+-------------+
| A | 4 | 7 | 4 | 500 |
| A | 4 | 7 | 5 | 500 |
| A | 4 | 7 | 6 | 500 |
| A | 4 | 7 | 7 | 500 |
| B | 3 | 6 | 3 | 600 |
| B | 3 | 6 | 4 | 600 |
| B | 3 | 6 | 5 | 600 |
| B | 3 | 6 | 6 | 600 |
| C | 2 | 2 | 4 | 700 |
| C | 2 | 3 | 4 | 700 |
| C | 2 | 4 | 4 | 700 |
+---------------+------------------+----------------+-------------+-------------+
最好的方法是什么?
我正在考虑应用功能,例如:
def repeat(a):
if (a['price_start_week']>a['price_end_week']):
return a['price_start_week']-a['price_end_week']
...
df['actual_week']=df.apply(repeat, axis=0)
答案 0 :(得分:1)
使用Index.repeat
作为两周之间的差值,然后使用GroupBy.cumcount
作为每组的计数:
a = df['price_end_week'] - df['price_start_week'] + 1
df = df.loc[df.index.repeat(a)].reset_index(drop=True)
df['actual week'] = df.groupby('customer_name').cumcount() + df['price_start_week']
print (df)
customer_name price_start_week price_end_week price_value actual week
0 A 4 7 500 4
1 A 4 7 500 5
2 A 4 7 500 6
3 A 4 7 500 7
4 B 3 6 600 3
5 B 3 6 600 4
6 B 3 6 600 5
7 B 3 6 600 6
8 C 2 4 700 2
9 C 2 4 700 3
10 C 2 4 700 4