这是我到目前为止所拥有的:
这是数据框的示例:
A B C D
1 2 7 12 14
2 4 5 11 23
3 4 6 14 20
4 4 7 13 50
5 9 6 14 35
这是我努力的一个例子:
import time
import pandas as pd
then = time.time()
count = 0
df = pd.read_csv('Get_Numbers.csv')
df.columns = ['A', 'B', 'C', 'D']
while True:
df_elements = df.sample(n=1)
random_row = df_elements
print(random_row)
find_this_row = df['A','B','C','D' == '4','7','13,'50']
print(find_this_row)
if find_this_row != random_row:
count += 1
else:
break
print("You found the correct numbers! And it only took " + str(count) + " tries to get there! Your numbers were: " + str(find_this_row))
now = time.time()
print("It took: ", now-then, " seconds")
上面的代码给出了一个明显的错误……但是我现在尝试了许多不同的版本来找到find_this_row
数字,以至于我不知道该做什么了,所以我放弃了这一尝试。
我想避免的是对要查找的行使用特定的索引,我宁愿仅使用值来查找该行。
我正在使用df_elements = df.sample(n=1)
随机选择一行。这样做是为了避免使用random.choice
,因为我不确定这是否行得通,或者哪种方式更节省时间/内存,但我也愿意就此提出建议。
在我看来,随机选择一行数据似乎很简单,如果它与我想要的数据行不匹配,请继续随机选择数据行直到匹配为止。但是我似乎无法执行。
非常感谢任何帮助!
答案 0 :(得分:1)
您可以使用返回np.ndarray
中的shape=(1, 2)
的值,使用values[0]
仅获取一维数组。
然后将数组与any()
import time
import pandas as pd
then = time.time()
df = pd.DataFrame(data={'A': [1, 2, 3],
'B': [8, 9, 10]})
find_this_row = [2, 9]
print("Looking for: {}".format(find_this_row))
count = 0
while True:
random_row = df.sample(n=1).values[0]
print(random_row)
if any(find_this_row != random_row):
count += 1
else:
break
print("You found the correct numbers! And it only took " + str(count) + " tries to get there! Your numbers were: " + str(find_this_row))
now = time.time()
print("It took: ", now-then, " seconds")
答案 1 :(得分:0)
如何使用values
?
values
将返回一个值列表。然后您可以轻松比较两个列表。
list1 == list2
将返回一个True
和False
值的数组,它比较相应列表的索引。您可以检查返回的所有值是否都是True
答案 2 :(得分:0)
这是一次测试一行的方法。我们检查所选行的values
是否等于采样的DataFrame
的值。我们要求它们all
匹配。
row = df.sample(1)
counter = 0
not_a_match = True
while not_a_match:
not_a_match = ~(df.sample(n=1).values == row.values).all()
counter+=1
print(f'It took {counter} tries and the numbers were\n{row}')
#It took 9 tries and the numbers were
# A B C D
#4 4 7 13 50
如果您想更快一点,请选择一行,然后对DataFrame
进行多次替换采样。然后,您可以第一次检查采样行等于您采样的DataFrame
,从而为您提供了一次while循环中需要进行的“尝试”次数,但时间却少得多。鉴于它是通过替换采样的,因此该循环可防止发生我们找不到匹配项的可能性极小的情况。
row = df.sample(1)
n = 0
none_match = True
k = 10 # Increase to check more matches at once.
while none_match:
matches = (df.sample(n=len(df)*k, replace=True).values == row.values).all(1)
none_match = ~matches.any() # Determine if none still match
n += k*len(df)*none_match # Only increment if none match
n = n + matches.argmax() + 1
print(f'It took {n} tries and the numbers were\n{row}')
#It took 3 tries and the numbers were
# A B C D
#4 4 7 13 50
答案 3 :(得分:0)
首先有一些提示。这行对我不起作用:
find_this_row = df['A','B','C','D' == '4','7','13,'50']
有两个原因:
df ['A','B','C','D'...
使用任意键返回DataFrame():
df[['A','B','C','D']]
或作为Series():
df['A']
由于您需要整行包含多列,因此请执行以下操作:
df2.iloc[4].values
array(['4','7','13','50'],dtype = object)
对示例行执行相同操作:
df2.sample(n=1).values
需要对all()元素/列进行行之间的比较:
df2.sample(n=1).values == df2.iloc[4].values
array([[True,False,False,False]])
添加.all()如下:
(df2.sample(n=1).values == df2.iloc[4].values).all()
返回
对/错
一起:
import time
import pandas as pd
then = time.time()
count = 0
while True:
random_row = df2.sample(n=1).values
find_this_row = df2.iloc[4].values
if (random_row == find_this_row).all() == False:
count += 1
else:
break
print("You found the correct numbers! And it only took " + str(count) + " tries to get there! Your numbers were: " + str(find_this_row))
now = time.time()
print("It took: ", now-then, " seconds")