如何在python中每行上计算带有值的列数?

时间:2018-08-21 00:47:09

标签: python pandas

我有一个这样的数据框:

IndividualID Trip1 Trip2 Trip3 Trip4 Trip5 Trip6 Trip7 Trip8 Trip9
200100001    23    1     2     4     4      1    5     5     5
200100002    21    1     12    3     1      55   7     7
200100003    12    3     3     6     3     
200100004    4   
200100005    6     5     3     9     3      5    6  
200100005    23    4     4     2     4      3    6     5  

我想知道每个人的旅行次数,因此我想创建一个新列,以便新表看起来像这样:

IndividualID Trip1  Trip2  Trip3  Trip4  Trip5  Trip6  Trip7  Trip8  Trip9 Chains
200100001     23     1      2      4      4      1     5       5     5      9
200100002     21     1      12     3      1      55    7       7            8
200100003     12     3      3      6      3                                 5
200100004     4                                                             1
200100005     6      5      3      9      3      5     6                    7
200100005     23     4      4      2      4      3     6       5            8

有没有可能的解决方案?如果有人可以提供帮助,我将不胜感激!提前致谢!

6 个答案:

答案 0 :(得分:5)

使用iloccount,默认情况下会忽略NaN

df.iloc[:, 1:].count(1)

0    9
1    8
2    5
3    1
4    7
5    8
dtype: int64

如果值不是 NaN,只需将空字符串替换为NaN

df.iloc[:, 1:].replace('', np.nan).count(1)

答案 1 :(得分:5)

使用

df.ne('').sum(1)-1
Out[287]: 
0    9
1    8
2    5
3    1
4    7
5    8
dtype: int64

如果使用info是NaN,则

df.iloc[:,1:].T.info()
<class 'pandas.core.frame.DataFrame'>
Index: 9 entries, Trip1 to Trip9
Data columns (total 6 columns):
0    9 non-null float64
1    8 non-null float64
2    5 non-null float64
3    1 non-null float64
4    7 non-null float64
5    8 non-null float64
dtypes: float64(6)
memory usage: 504.0+ bytes

答案 2 :(得分:3)

只需查找非空项目,然后对行求和:

df['Chains'] = df.notnull().sum(axis=1) - 1

我必须减去一个来说明您的IndividualID列。这是我得到的结果:

   IndividualID  Trip1  Trip2  Trip3  Trip4  Trip5  Trip6  Trip7  Trip8  Trip9  Chains
0     200100001     23    1.0    2.0    4.0    4.0    1.0    5.0    5.0    5.0       9
1     200100002     21    1.0   12.0    3.0    1.0   55.0    7.0    7.0    NaN       8
2     200100003     12    3.0    3.0    6.0    3.0    NaN    NaN    NaN    NaN       5
3     200100004      4    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN       1
4     200100005      6    5.0    3.0    9.0    3.0    5.0    6.0    NaN    NaN       7
5     200100005     23    4.0    4.0    2.0    4.0    3.0    6.0    5.0    NaN       8

答案 3 :(得分:3)

将所有空白值替换为NaN,然后使用notnull按行计算sum(1)值:

df['Chains'] = df.iloc[:,1:].replace('',np.nan).notnull().sum(1)

>>> df
   IndividualID  Trip1  Trip2  Trip3  Trip4  Trip5  Trip6  Trip7  Trip8  \
0     200100001     23    1.0    2.0    4.0    4.0    1.0    5.0    5.0   
1     200100002     21    1.0   12.0    3.0    1.0   55.0    7.0    7.0   
2     200100003     12    3.0    3.0    6.0    3.0    NaN    NaN    NaN   
3     200100004      4    NaN    NaN    NaN    NaN    NaN    NaN    NaN   
4     200100005      6    5.0    3.0    9.0    3.0    5.0    6.0    NaN   
5     200100005     23    4.0    4.0    2.0    4.0    3.0    6.0    5.0   

   Trip9  Chains  
0    5.0       9  
1    NaN       8  
2    NaN       5  
3    NaN       1  
4    NaN       7  
5    NaN       8  

答案 4 :(得分:2)

也许:

>>> df.replace('',pd.np.nan).count(axis=1)-1
0    9
1    8
2    5
3    1
4    7
5    8
dtype: int64

或者如果有nan做:

>>> df.count(axis=1)-1
0    9
1    8
2    5
3    1
4    7
5    8
dtype: int64

然后做:

df['Chains'] = ...

用于将其分配给列

答案 5 :(得分:2)

只要我们给出其他选择,如果值是NaN

df['cat'] = (~np.isnan(df.set_index('IndividualID').values)).sum(1)

IndividualID
200100001    9
200100002    8
200100003    5
200100004    1
200100005    7
200100005    8