我有一个这样的数据框:
IndividualID Trip1 Trip2 Trip3 Trip4 Trip5 Trip6 Trip7 Trip8 Trip9
200100001 23 1 2 4 4 1 5 5 5
200100002 21 1 12 3 1 55 7 7
200100003 12 3 3 6 3
200100004 4
200100005 6 5 3 9 3 5 6
200100005 23 4 4 2 4 3 6 5
我想知道每个人的旅行次数,因此我想创建一个新列,以便新表看起来像这样:
IndividualID Trip1 Trip2 Trip3 Trip4 Trip5 Trip6 Trip7 Trip8 Trip9 Chains
200100001 23 1 2 4 4 1 5 5 5 9
200100002 21 1 12 3 1 55 7 7 8
200100003 12 3 3 6 3 5
200100004 4 1
200100005 6 5 3 9 3 5 6 7
200100005 23 4 4 2 4 3 6 5 8
有没有可能的解决方案?如果有人可以提供帮助,我将不胜感激!提前致谢!
答案 0 :(得分:5)
使用iloc
和count
,默认情况下会忽略NaN
:
df.iloc[:, 1:].count(1)
0 9
1 8
2 5
3 1
4 7
5 8
dtype: int64
如果值不是 NaN
,只需将空字符串替换为NaN
:
df.iloc[:, 1:].replace('', np.nan).count(1)
答案 1 :(得分:5)
使用
df.ne('').sum(1)-1
Out[287]:
0 9
1 8
2 5
3 1
4 7
5 8
dtype: int64
如果使用info
是NaN,则
df.iloc[:,1:].T.info()
<class 'pandas.core.frame.DataFrame'>
Index: 9 entries, Trip1 to Trip9
Data columns (total 6 columns):
0 9 non-null float64
1 8 non-null float64
2 5 non-null float64
3 1 non-null float64
4 7 non-null float64
5 8 non-null float64
dtypes: float64(6)
memory usage: 504.0+ bytes
答案 2 :(得分:3)
只需查找非空项目,然后对行求和:
df['Chains'] = df.notnull().sum(axis=1) - 1
我必须减去一个来说明您的IndividualID
列。这是我得到的结果:
IndividualID Trip1 Trip2 Trip3 Trip4 Trip5 Trip6 Trip7 Trip8 Trip9 Chains
0 200100001 23 1.0 2.0 4.0 4.0 1.0 5.0 5.0 5.0 9
1 200100002 21 1.0 12.0 3.0 1.0 55.0 7.0 7.0 NaN 8
2 200100003 12 3.0 3.0 6.0 3.0 NaN NaN NaN NaN 5
3 200100004 4 NaN NaN NaN NaN NaN NaN NaN NaN 1
4 200100005 6 5.0 3.0 9.0 3.0 5.0 6.0 NaN NaN 7
5 200100005 23 4.0 4.0 2.0 4.0 3.0 6.0 5.0 NaN 8
答案 3 :(得分:3)
将所有空白值替换为NaN
,然后使用notnull
按行计算sum(1)
值:
df['Chains'] = df.iloc[:,1:].replace('',np.nan).notnull().sum(1)
>>> df
IndividualID Trip1 Trip2 Trip3 Trip4 Trip5 Trip6 Trip7 Trip8 \
0 200100001 23 1.0 2.0 4.0 4.0 1.0 5.0 5.0
1 200100002 21 1.0 12.0 3.0 1.0 55.0 7.0 7.0
2 200100003 12 3.0 3.0 6.0 3.0 NaN NaN NaN
3 200100004 4 NaN NaN NaN NaN NaN NaN NaN
4 200100005 6 5.0 3.0 9.0 3.0 5.0 6.0 NaN
5 200100005 23 4.0 4.0 2.0 4.0 3.0 6.0 5.0
Trip9 Chains
0 5.0 9
1 NaN 8
2 NaN 5
3 NaN 1
4 NaN 7
5 NaN 8
答案 4 :(得分:2)
也许:
>>> df.replace('',pd.np.nan).count(axis=1)-1
0 9
1 8
2 5
3 1
4 7
5 8
dtype: int64
或者如果有nan
做:
>>> df.count(axis=1)-1
0 9
1 8
2 5
3 1
4 7
5 8
dtype: int64
然后做:
df['Chains'] = ...
用于将其分配给列
答案 5 :(得分:2)
只要我们给出其他选择,如果值是NaN
df['cat'] = (~np.isnan(df.set_index('IndividualID').values)).sum(1)
IndividualID
200100001 9
200100002 8
200100003 5
200100004 1
200100005 7
200100005 8