根据层次结构验证熊猫数据框列

时间:2018-11-22 17:24:21

标签: pandas dataframe

我有一个像这样的数据框

df1 = pd.DataFrame({'Site': ["S1", "S2", "S3", "S4", "S5", "S6","S7","S8","S9"],  
                'Sitelink': [" ","S1","S2","S6","S4"," ","S8"," ","S7"],  
                   'level': ["R", "T", "P", "T", "P", "R","T","R","P"],                     
                  'Weight':["55","55","55","85","85","80","150","190","200"]}) 

列“网站”将始终是唯一的

列“ Sitelink”将下一个较低级别的站点捕获到每个站点

列“级别”具有3个值-R,T,​​P,其中层次结构为R

列“重量”可以是任何值。

输出应满足以下条件:较高级别的站点的重量应始终小于或等于较低级别的站点。预期结果数据框应类似于

Output Dataframe

我正在尝试循环数据框,并将每个站点与下一个级别进行比较。有更好的方法吗?

1 个答案:

答案 0 :(得分:0)

如果我的理解正确,那么您想检查该站点的权重是否小于或等于标记为 Sitelink 的站点的权重。

单行代码将是:

Array
(
    [0] => some@example.com
    [1] => some@example.co.uk
    [2] => hello@åä-ö.com
    [3] => example@so.il.uk
)

因此,我们可以使用def is_error(row): if row['Sitelink'] == " ": return 'No Error' site_link = df.loc[df['Site'] == row['Sitelink']] if int(row['Weight']) <= int(site_link['Weight']): return 'No Error' else: return 'Higher than lower' 函数将此行应用于每一行:

apply