我的发票数据如下
+----------------+-----------+-------------+-----+-------+
| ID | Date | Description | QTY | Price |
+----------------+-----------+-------------+-----+-------+
| 1XpP1 | 08-Feb-19 | A | 1 | 8 |
| Total [INV001] | | | 8 | 8 |
| 1XpQ1 | 08-Feb-19 | A | 1 | 10 |
| 1XpQ1 | 08-Feb-19 | B | 1 | 10 |
| Total [INV002] | | | 2 | 20 |
| 1XpP1 | 08-Feb-19 | A | 1 | 12 |
| 1XpP1 | 08-Feb-19 | B | 1 | 12 |
| 1XpP1 | 08-Feb-19 | C | 1 | 12 |
| 1XpP1 | 08-Feb-19 | D | 1 | 12 |
| Total [INV003] | | | 4 | 48 |
+----------------+-----------+-------------+-----+-------+
请注意每张发票下的Total
行。其中包含invoice No
。我想完全删除此行,并在差异列中分别添加Total
参考。我想要的输出如下。
+-------+-----------+-------------+-----+-------+----------------+
| ID | Date | Description | QTY | Price | ID Adjusted |
+-------+-----------+-------------+-----+-------+----------------+
| 1XpP1 | 08-Feb-19 | A | 1 | 8 | Total [INV001] |
| 1XpQ1 | 08-Feb-19 | A | 1 | 10 | Total [INV002] |
| 1XpQ1 | 08-Feb-19 | B | 1 | 10 | Total [INV002] |
| 1XpP1 | 08-Feb-19 | A | 1 | 12 | Total [INV003] |
| 1XpP1 | 08-Feb-19 | B | 1 | 12 | Total [INV003] |
| 1XpP1 | 08-Feb-19 | C | 1 | 12 | Total [INV003] |
| 1XpP1 | 08-Feb-19 | D | 1 | 12 | Total [INV003] |
+-------+-----------+-------------+-----+-------+----------------+
请给我一个开始解决这个问题的开始。我不知道如何对这些数据进行分组。随着发票上的行数变化。
答案 0 :(得分:1)
想法是将Series.where
与Series.str.endswith
结合使用,以查找不匹配的行的缺失值并回填它们,然后使用boolean indexing
通过倒置掩码过滤,最后可以添加DataFrame.copy
为避免SettingwithCopyWarning
(如果稍后要处理数据):
m = df['ID'].str.endswith(']')
df['ID Adjusted'] = df['ID'].where(m).bfill()
df = df[~m].copy()
print (df)
ID Date Description QTY Price ID Adjusted
0 1XpP1 08-Feb-19 A 1 8 Total [INV001]
2 1XpQ1 08-Feb-19 A 1 10 Total [INV002]
3 1XpQ1 08-Feb-19 B 1 10 Total [INV002]
5 1XpR1 08-Feb-19 A 1 12 Total [INV003]
6 1XpR1 08-Feb-19 B 1 12 Total [INV003]
7 1XpR1 08-Feb-19 C 1 12 Total [INV003]
8 1XpR1 08-Feb-19 D 1 12 Total [INV003]