Question

我有一个包含384行的数据帧（在bigining中有一个额外的虚拟行）。每行有4个我手动编写的变量。基于这4个变量的3个计算字段。和3，它们会将每个计算出的变量与之前的行进行比较。每个字段可以有两个值之一（基本上为True / False）。

最终目标-我想以6个计算字段（2 ^ 6）的64种可能组合出现6次（2 ^ 6 * 6 = 384）的方式排列数据帧。每次迭代都会生成一个频率表（枢轴），如果其中一个字段不同于6，则会中断并随机化顺序。

有384！-12 * 6！可能的组合，并且我的计算机在无法解决的情况下运行了以下脚本超过4天。

import pandas as pd
from numpy import random

# a function that calculates if a row is congruent or in-congruent
def set_cong(df):
    if df["left"] > df["right"] and df["left_size"] > df["right_size"] or df["left"] < df["right"] and df["left_size"] < df["right_size"]:
         return "Cong"
    else:
         return "InC"

# open file and calculate the basic fields
DF = pd.read_csv("generator.csv")
DF["distance"] = abs(DF.right-DF.left)
DF["CR"] = DF.left > DF.right
DF["Cong"] = DF.apply(set_cong, axis=1)
again = 1

# main loop to try and find optimal order
while again == 1:
    # make a copy of the DF to not have to load it each iteration
    df = DF.copy()
    again = 0
    df["rand"] = [[random.randint(low=1, high=100000)] for i in range(df.shape[0])]
    # as 3 of the fields are calculated based on the previous row the first one is a dummy and when sorted needs to stay first
    df.rand.loc[0] = 0
    Sorted = df.sort_values(['rand'])

    Sorted["Cong_n1"] = Sorted.Cong.eq(Sorted.Cong.shift())
    Sorted["Side_n1"] = Sorted.CR.eq(Sorted.CR.shift())
    Sorted["Dist_n1"] = Sorted.distance.eq(Sorted.distance.shift())
    # here the dummy is deleted
    Sorted = Sorted.drop(0, axis=0)
    grouped = Sorted.groupby(['distance', 'CR', 'Cong', 'Cong_n1', 'Dist_n1', "Side_n1"])

    for name, group in grouped:
        if group.shape[0] != 6:
            again = 1
            break

Sorted.to_csv("Edos.csv", sep="\t",index=False)
print ("bye")

数据框如下：

left right size_left size_right distance cong CR distance_n1 cong_n1 side_n1

  1    6      22         44        5      T    F   dummy       dummy   dummy

  5    4      44         22        1      T    T     F           T       F

  2    3      44         22        1      F    F     T           F       F

关于如何解决无限循环问题的建议（python-pandas）

0 个答案: