第一个字符串

Question

问题是要根据条件将数据帧的索引从所有非空值中切出，同时连接索引而不在获取索引后使用for循环进行迭代？

我能够做到这一点，但是在根据非空索引对df进行切片之后使用了for循环。我希望做到这一点而无需单独遍历索引。

df=pd.DataFrame([["a",2,1],["b",np.nan,np.nan],["c",np.nan,np.nan],["d",3,4]])
list1=[]
indexes=(df.dropna().index.values).tolist()
indexes.append(df.shape[0])
for i in range(len(indexes)-1):
    list1.append(" ".join(df[0][indexes[i]:indexes[i+1]].tolist()))

# list1 becomes ['abc', 'de']

这是示例DF：

    0   1     2
0   a   2.0  1.0
1   b   NaN  NaN
2   c   NaN  NaN
3   d   3.0  4.0
4   e   NaN  NaN

预期输出将是类似[abc，de]的列表

说明：

第一个字符串

a: not null (start picking)
b: null
c: null

第二个字符串

d: not null (second not-null encountered concat to second string)
e:null

Answer 1

cumsum是这种情况：

# change all(axis=1) to any(axis=1) if only one NaN in a row is enough
s = df.iloc[:,1:].notnull().all(axis=1)

df[0].groupby(s.cumsum()).apply(''.join)

输出：

1    abc
2     de
Name: 0, dtype: object

优化的方法来处理数据框列以基于条件的列表而没有for循环？

第一个字符串

第二个字符串

1 个答案: