Question

我正在尝试使用csv，并将其作为Pandas Dataframe读取。
此数据框包含4行数字。
我想从数据框中选择特定的数据行。
在While循环中，我想从Dataframe中选择一个随机行，并将其与我选择的行进行比较。
我希望它继续在while循环中运行，直到该随机行等于我之前选择的行的100％。
然后，我希望While循环中断，并希望它计算出与随机数匹配所需的尝试次数。

这是我到目前为止所拥有的：

这是数据框的示例：

    A  B  C  D
1   2  7  12 14
2   4  5  11 23
3   4  6  14 20
4   4  7  13 50
5   9  6  14 35

这是我努力的一个例子：

import time
import pandas as pd

then = time.time()

count = 0

df = pd.read_csv('Get_Numbers.csv')
df.columns = ['A', 'B', 'C', 'D']

while True:
    df_elements = df.sample(n=1)
    random_row = df_elements
    print(random_row)
    find_this_row = df['A','B','C','D' == '4','7','13,'50']
    print(find_this_row)
    if find_this_row != random_row:
        count += 1
    else:
        break

print("You found the correct numbers! And it only took " + str(count) + " tries to get there! Your numbers were: " + str(find_this_row))

now = time.time()

print("It took: ", now-then, " seconds")

上面的代码给出了一个明显的错误……但是我现在尝试了许多不同的版本来找到find_this_row数字，以至于我不知道该做什么了，所以我放弃了这一尝试。

我想避免的是对要查找的行使用特定的索引，我宁愿仅使用值来查找该行。

我正在使用df_elements = df.sample(n=1)随机选择一行。这样做是为了避免使用random.choice，因为我不确定这是否行得通，或者哪种方式更节省时间/内存，但我也愿意就此提出建议。

在我看来，随机选择一行数据似乎很简单，如果它与我想要的数据行不匹配，请继续随机选择数据行直到匹配为止。但是我似乎无法执行。

非常感谢任何帮助！

Answer 1

您可以使用返回np.ndarray中的shape=(1, 2)的值，使用values[0]仅获取一维数组。

然后将数组与any()

进行比较

import time
import pandas as pd

then = time.time()

df = pd.DataFrame(data={'A': [1, 2, 3],
                        'B': [8, 9, 10]})

find_this_row = [2, 9]
print("Looking for: {}".format(find_this_row))

count = 0
while True:
    random_row = df.sample(n=1).values[0]
    print(random_row)

    if any(find_this_row != random_row):
        count += 1
    else:
        break

print("You found the correct numbers! And it only took " + str(count) + " tries to get there! Your numbers were: " + str(find_this_row))

now = time.time()

print("It took: ", now-then, " seconds")

Answer 2

如何使用values？

values将返回一个值列表。然后您可以轻松比较两个列表。

list1 == list2将返回一个True和False值的数组，它比较相应列表的索引。您可以检查返回的所有值是否都是True

Answer 3

这是一次测试一行的方法。我们检查所选行的values是否等于采样的DataFrame的值。我们要求它们all匹配。

row = df.sample(1)

counter = 0
not_a_match = True

while not_a_match:
    not_a_match = ~(df.sample(n=1).values == row.values).all()
    counter+=1

print(f'It took {counter} tries and the numbers were\n{row}')
#It took 9 tries and the numbers were
#   A  B   C   D
#4  4  7  13  50

如果您想更快一点，请选择一行，然后对DataFrame进行多次替换采样。然后，您可以第一次检查采样行等于您采样的DataFrame，从而为您提供了一次while循环中需要进行的“尝试”次数，但时间却少得多。鉴于它是通过替换采样的，因此该循环可防止发生我们找不到匹配项的可能性极小的情况。

row = df.sample(1)

n = 0
none_match = True
k = 10  # Increase to check more matches at once.

while none_match:
    matches = (df.sample(n=len(df)*k, replace=True).values == row.values).all(1)
    none_match = ~matches.any()  # Determine if none still match
    n += k*len(df)*none_match  # Only increment if none match
n = n + matches.argmax() + 1

print(f'It took {n} tries and the numbers were\n{row}')
#It took 3 tries and the numbers were
#   A  B   C   D
#4  4  7  13  50

Answer 4

首先有一些提示。这行对我不起作用：

find_this_row = df['A','B','C','D' == '4','7','13,'50']

有两个原因：

在'13后缺少'''
df是一个DataFrame（），因此不支持使用如下所示的键

df ['A'，'B'，'C'，'D'...

使用任意键返回DataFrame（）：

df[['A','B','C','D']]

或作为Series（）：

df['A']

由于您需要整行包含多列，因此请执行以下操作：

df2.iloc[4].values

array（['4'，'7'，'13'，'50']，dtype = object）

对示例行执行相同操作：

df2.sample(n=1).values

需要对all（）元素/列进行行之间的比较：

df2.sample(n=1).values == df2.iloc[4].values

array（[[True，False，False，False]]）

添加.all（）如下：

(df2.sample(n=1).values == df2.iloc[4].values).all()

返回

对/错

一起：

import time
import pandas as pd

then = time.time()
count = 0
while True:
    random_row = df2.sample(n=1).values
    find_this_row = df2.iloc[4].values
    if (random_row == find_this_row).all() == False:
        count += 1
    else:
        break

print("You found the correct numbers! And it only took " + str(count) + " tries to get there! Your numbers were: " + str(find_this_row))

now = time.time()

print("It took: ", now-then, " seconds")

在While循环中从Pandas Dataframe查找特定的数据行

4 个答案: