Question

我有一个熊猫数据框，例如

>>> import pandas as pd
>>> df = pd.DataFrame([[0,1,1,1,0,0,1,1], [0,0,1,1,1,1,0,1]], index=['A', 'B'])
>>> df = df.add_prefix('q')
>>> df
   q0  q1  q2  q3  q4  q5  q6  q7
A   0   1   1   1   0   0   1   1
B   0   0   1   1   1   1   0   1

值1出现在每行的不同位置。我想创建一个附加列max_length_of_1_appears，其每行的值是1在该行中连续出现的最大次数。在上面的示例中，结果数据框应为

>>> df
   q0  q1  q2  q3  q4  q5  q6  q7  max_length_of_1_appears
A   0   1   1   1   0   0   1   1                        3
B   0   0   1   1   1   1   0   1                        4

由于在行A中，最长的1序列是从列q1到列q3，而在行B中，最长序列是从列{{1} }到q2列。

Answer 1

您可以将系列转换为列表，然后将其插入可以回答您问题的函数。

Dispose()

如果您有大型数据集，例如使用HttpClient函数，肯定有更好的方法。无论如何，如果您只需要担心两个系列，就应该可以完成工作。

Answer 2

如果可以使用numpy，可以这样做：

arr = df.to_numpy()

# Add columns of zeros to the left and right.
padded = np.pad(arr, [(0,0), (1,1)], mode='constant')

# Get indices in each row where transitions between 0's and 1's occur.
diffs = np.diff(padded)
rows, wheres = np.where(diffs)

# Compute the length of each patch of 1's.
rows, lengths = rows[::2], np.diff(wheres)[::2]

# Compute the maximal length for each row.
rows, split_at = np.unique(rows, return_index=True)
maxima = np.maximum.reduceat(lengths, split_at)

# Store the computed maxima into a new column of df.
df['max_length_of_1_appears'] = 0
df.loc[rows, 'max_length_of_1_appears'] = maxima

如果您足够努力的话，这里的每个步骤可能都有与之等效的熊猫。

如何在数据帧的每一行中找到最长连续序列1的长度？

2 个答案: