我已经在jupyter panda中导入了一个csv文件,其数据结构如下:
Handle | Name | Vendor| Option1 Name | Option1 Value |
drive-gt| Aero | Bando | Size | |
drive-gt| | | | S |
drive-gt| | | | M |
drive-gt| | | | XL |
drive-gt| | | | XXL |
有1000多个不同的句柄。我想将每个第二句柄行的Option1值放在第一行,然后删除第二行。像这样:
Handle | Name | Vendor| Option1 Name | Option1 Value |
drive-gt| Aero | Bando | Size | S |
drive-gt| | | | M |
drive-gt| | | | XL |
drive-gt| | | | XXL |
任何想法如何解决?预先感谢。
答案 0 :(得分:2)
假定测试数据框包含2个句柄的数据,并带有“空”单元格 仅包含一个空字符串(不是您的数据示例中的 NaN ):
@if (User.Identity.IsAuthenticated)// check whether the user is authenticated or not
{
if (User.IsInRole("HR"))//Check wether the user is in that role
{
//Contents to be displayed for that Role!
//some sample content which will be displayed to the user of a Role HR
<div>
<h5><strong>HR Approval</strong></h5>
</div>
<div>
<button type="button" name="btnApprove" id="btnApprove">Approve</button>
<button type="button" name="btnReject" id="btnReject">Reject</button>
</div>
<br />
}
}
我看到您的数据已正确排序,所以最直观的解决方案是:
执行此操作的代码是:
定义一个函数以“重新格式化”当前组:
>>> import datetime
>>> year_range = lambda x: range(x -3, x + 4)
>>> year_range(datetime.date.today().year)
[2017, 2018, 2019, 2020, 2021, 2022, 2023]
将其应用于每个组:
Handle Name Vendor Option1 Name Option1 Value
0 drive-gt Aero Bando Size
1 drive-gt S
2 drive-gt M
3 drive-gt XL
4 drive-gt XXL
5 abcde Xxxx Yyyyy Width
6 abcde A
7 abcde B
8 abcde C
需要最终的 reset_index 才能删除其他索引级别 由 groupby 引入。实际上,仅下降就足够了 MultiIndex的 0 级,但我认为更好的选择是 还要降低MultiIndex的第二个(“原始”)级别,并 将新索引重新创建为连续整数序列。
上述数据的结果是:
def myReformat(grp):
rv = grp.copy()
rv.iloc[1, 1:4] = rv.iloc[0, 1:4]
return rv.iloc[1:]
答案 1 :(得分:0)
希望这会有所帮助!
import pandas as pd
df=pd.DataFrame({
'Handle':['drive-gt','drive-gt','drive-gt','drive-gt','drive-gt','drive-gt1','drive-gt1','drive-gt1','drive-gt1','drive-gt1'],
'Name':['Aero','','','','','Aero','','','',''],
'Vendor':['Bando','','','','','Bando','','','',''],
'Option1 Name':['Size','','','','','Size','','','',''],
'Option1 Value':['','S','M','XL','XXL','','S','M','XL','XXL']
})
#import numpy
import numpy as np
#replace all space by nan
df=df.replace(r'^\s*$', np.nan, regex=True)
#ffill for each group
df=df.groupby('Handle').ffill()
#remove all Option1 Value which are nan
df=df[~df['Option1 Value'].isnull()]