检查pandas数据帧中的重复序列

时间:2018-01-28 02:57:34

标签: python pandas

我有一个pandas Dataframe,其中一列的重复序列值几乎如下所示:

      Cell
0      x_a
1      x_b
2      x_c
3      x_a
4      x_b
5      x_c
6      x_a
7      x_b
8      x_b
9      x_c
10     x_c
11     x_b
12     x_a

我需要检查整个列,看看重复这个序列" x_a,x_b,x_c"完全按此顺序维护,即" x_c"遵循x_b"遵循" x_a"。

无论此订单何时被破坏,例如在指数7和8中," B"重复两次或10,11,12这里的顺序是错误的,我需要能够找出哪个值正在犯规?

关于如何做的任何指示?

我一直在用df.loc抓住我的头,但无济于事,我相当肯定df.loc不是正确的方法。

先谢谢你们。

1 个答案:

答案 0 :(得分:0)

我使用预定义的订单规则编写了此解决方案:

import pandas as pd

#Creating Dummy Dataframe
dummy_frame = pd.DataFrame(columns=["dummy"])

#Adding Dummy Values to the DataFrame
dummy_frame["dummy"] = ["x_a","x_b","x_c","x_a","x_b","x_c","x_a","x_a","x_b"]

#Pre-defining order to check in the dataframe
correct_order = ["x_a","x_b","x_c"]

#For Loop Based on length of the order (Triplets in this case)
for i in range(0,len(dummy_frame),len(correct_order)):

    #Check if the order is matched
    if correct_order != dummy_frame["dummy"][i:i+3].tolist():
        for j in range(len(correct_order)):

            #Check for the incorrect value in the triplet
            if correct_order[j] != dummy_frame["dummy"][i:i+3].tolist()[j]:
                print "Value at index:",i+j,"is incorrect."
                print "Current Value:",dummy_frame["dummy"][i:i+3].tolist()[j],"Correct Value is:",correct_order[j]

示例输出:

Value at index: 7 is incorrect.
Current Value: x_a Correct Value is: x_b
Value at index: 8 is incorrect.
Current Value: x_b Correct Value is: x_c

希望这会有所帮助:)