Question

我有一个pandas数据帧，我需要根据现有列创建一个列（不是很难），但我需要i值基于i-1值专栏。示例系列：

data = np.array([0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1])

我希望i元素为1，如果它是一系列1 s的开头，例如：

array([0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0])

我还希望能够做其他操作，但只是了解如何在不迭代的情况下执行此操作会非常有帮助。如果有人问我，我很抱歉，我不知道如何搜索它。

Answer 1

np.where

# [0 0 0 1 1 1 0 1 0 0 0 1 1 1] <- data
# [0 0 0 0 1 1 1 0 1 0 0 0 1 1] <- np.append(0, data[:-1])
#  ^ \__shifted data d[:-1]__/
#  |
# appended zero
# [1 1 1 1 0 0 0 1 0 1 1 1 0 0] <- ~np.append(0, data[:-1])
# [0 0 0 1 0 0 0 1 0 0 0 1 0 0] <- result

np.where(data & ~np.append(0, data[:-1]).astype(bool), 1, 0)

array([0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0])

使用数组乘法

data * (1 - np.append(0, data[:-1]))

array([0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0])

np.diff

(np.diff(np.append(0, data)) == 1).astype(int)

array([0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0])

Answer 2

如果1是组的开头，则表示它是1而前一个元素不是1.这在pandas中比在纯numpy中更容易，因为“前一个元素不是1”可以使用shift进行翻译，它会移动所有数据（默认情况下为1）。

In [15]: s = pd.Series([0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1])

In [16]: ((s == 1) & (s.shift() != 1)).astype(int)
Out[16]: 
0     0
1     0
2     0
3     1
4     0
5     0
6     0
7     1
8     0
9     0
10    0
11    1
12    0
13    0
dtype: int64

即使是1是第一个元素的情况也可以，因为在1之前没有元素，我们在移位后得到NaN，并且NaN！= 1：

n [18]: s.shift().head()
Out[18]: 
0    NaN
1    0.0
2    0.0
3    0.0
4    1.0

基于其他条目的列上的Pandas操作

2 个答案: