我有一个pandas Dataframe,其中一列的重复序列值几乎如下所示:
Cell
0 x_a
1 x_b
2 x_c
3 x_a
4 x_b
5 x_c
6 x_a
7 x_b
8 x_b
9 x_c
10 x_c
11 x_b
12 x_a
我需要检查整个列,看看重复这个序列" x_a,x_b,x_c"完全按此顺序维护,即" x_c"遵循x_b"遵循" x_a"。
无论此订单何时被破坏,例如在指数7和8中," B"重复两次或10,11,12这里的顺序是错误的,我需要能够找出哪个值正在犯规?
关于如何做的任何指示?
我一直在用df.loc
抓住我的头,但无济于事,我相当肯定df.loc
不是正确的方法。
先谢谢你们。
答案 0 :(得分:0)
我使用预定义的订单规则编写了此解决方案:
import pandas as pd
#Creating Dummy Dataframe
dummy_frame = pd.DataFrame(columns=["dummy"])
#Adding Dummy Values to the DataFrame
dummy_frame["dummy"] = ["x_a","x_b","x_c","x_a","x_b","x_c","x_a","x_a","x_b"]
#Pre-defining order to check in the dataframe
correct_order = ["x_a","x_b","x_c"]
#For Loop Based on length of the order (Triplets in this case)
for i in range(0,len(dummy_frame),len(correct_order)):
#Check if the order is matched
if correct_order != dummy_frame["dummy"][i:i+3].tolist():
for j in range(len(correct_order)):
#Check for the incorrect value in the triplet
if correct_order[j] != dummy_frame["dummy"][i:i+3].tolist()[j]:
print "Value at index:",i+j,"is incorrect."
print "Current Value:",dummy_frame["dummy"][i:i+3].tolist()[j],"Correct Value is:",correct_order[j]
示例输出:
Value at index: 7 is incorrect.
Current Value: x_a Correct Value is: x_b
Value at index: 8 is incorrect.
Current Value: x_b Correct Value is: x_c
希望这会有所帮助:)