我有一个包含约70列的数据集,如下所示:
div
我想创建一个附加列(meeting_count),该列具有相应ID_number的Meeting1-Meeting5列的非空值计数。
通常我会使用SQL并执行以下操作:
menuButton
但如果在Python中有一种比较容易的方法,我宁愿这样做。
答案 0 :(得分:2)
试试这个
df['meeting_count'] = df.filter(regex=r'^Meeting').notnull().sum(axis=1)
演示:
In [8]: df
Out[8]:
ID_number Meeting1 Meeting2 Meeting3 Meeting4 Meeting5 Comments
123456789 9/15/2015 1/8/2016 4/27/2016 NaN NaN text text
987654321 9/22/2016 NaN 2/25/2017 NaN NaN text text
456789123 10/1/2015 11/30/2015 NaN NaN NaN text text
In [9]: df['meeting_count'] = df.filter(regex=r'^Meeting').notnull().sum(axis=1)
In [10]: df
Out[10]:
ID_number Meeting1 Meeting2 Meeting3 Meeting4 Meeting5 Comments meeting_count
123456789 9/15/2015 1/8/2016 4/27/2016 NaN NaN text text 3
987654321 9/22/2016 NaN 2/25/2017 NaN NaN text text 2
456789123 10/1/2015 11/30/2015 NaN NaN NaN text text 2