我正在尝试在Pandas数据帧行中查找值,并创建一个新列,突出显示下一行是否匹配。因此,对于以下示例:
rng = pd.DataFrame( {'test_1': ['A', 'A','A', 'A', 'B','B', 'A' , 'A', 'A', 'A','A' , 'A', 'A', 'A',]}, index = pd.date_range('4/2/2014', periods=14, freq='BH'))
reg
在2014-04-02 13:00:00和2014-04-02 14:00:00的行== B所以有匹配:
test_1
2014-04-02 09:00:00 A
2014-04-02 10:00:00 A
2014-04-02 11:00:00 A
2014-04-02 12:00:00 A
2014-04-02 13:00:00 B
2014-04-02 14:00:00 B
2014-04-02 15:00:00 A
2014-04-02 16:00:00 A
2014-04-03 09:00:00 A
2014-04-03 10:00:00 A
2014-04-03 11:00:00 C
2014-04-03 12:00:00 A
2014-04-03 13:00:00 D
2014-04-03 14:00:00 D
所以新栏目应该如下:
B_Matches
2014-04-02 09:00:00 0
2014-04-02 10:00:00 0
2014-04-02 11:00:00 0
2014-04-02 12:00:00 0
2014-04-02 13:00:00 0
2014-04-02 14:00:00 1
2014-04-02 15:00:00 0
2014-04-02 16:00:00 0
2014-04-03 09:00:00 0
2014-04-03 10:00:00 0
2014-04-03 11:00:00 0
2014-04-03 12:00:00 0
2014-04-03 13:00:00 0
2014-04-03 14:00:00 0
我将在其他列中为C,D等做同样的事情。我基本上试图找到某个条件的时间,并且下一个时段是相同的,我将在此列上执行count()以查看下一个时段匹配的频率。还请显示其他任何方法。
感谢您的帮助。
答案 0 :(得分:2)
你可以定义一个带有你的值的函数,并返回任何行是否符合你的条件,这适用于你传递的任何值,然后将布尔序列转换为int
,以便转换True
分别为False
到1
和0
:
In [220]:
def func(val):
return ((rng['test_1'] == val) & (rng['test_1'].shift() == val)).astype(int)
func('B')
Out[220]:
2014-04-02 09:00:00 0
2014-04-02 10:00:00 0
2014-04-02 11:00:00 0
2014-04-02 12:00:00 0
2014-04-02 13:00:00 0
2014-04-02 14:00:00 1
2014-04-02 15:00:00 0
2014-04-02 16:00:00 0
2014-04-03 09:00:00 0
2014-04-03 10:00:00 0
2014-04-03 11:00:00 0
2014-04-03 12:00:00 0
2014-04-03 13:00:00 0
2014-04-03 14:00:00 0
Freq: BH, Name: test_1, dtype: int32
In [222]:
func('A')
Out[222]:
2014-04-02 09:00:00 0
2014-04-02 10:00:00 1
2014-04-02 11:00:00 1
2014-04-02 12:00:00 1
2014-04-02 13:00:00 0
2014-04-02 14:00:00 0
2014-04-02 15:00:00 0
2014-04-02 16:00:00 1
2014-04-03 09:00:00 1
2014-04-03 10:00:00 1
2014-04-03 11:00:00 1
2014-04-03 12:00:00 1
2014-04-03 13:00:00 1
2014-04-03 14:00:00 1
Freq: BH, Name: test_1, dtype: int32