根据CSV Python检查列表中的数据

时间:2017-05-23 20:57:35

标签: python csv pandas dataframe

您好我是python的新手,我正在尝试通过创建一个可用的函数来增加我的知识。我正在尝试构建一个函数,该函数创建一个从1到59范围内的一组数字中取出的6个随机数的列表。现在我已经破解了那部分它是下一个棘手的部分。我现在想检查csv文件中随机集中的数字,然后如果从该集合中找到两个或更多数字,则打印出通知。现在我尝试print (df[df[0:].isin(luckyDip)])稍微成功一点,它检查数据帧中的数字,然后显示数据帧中匹配的数字但是它还显示数据帧的其余部分为NaN,这在技术上并不令人愉悦,也不是我想要的。

我只是在寻找关于下一步该做什么或者只是搜索google的内容的一些指示,下面是我一直在搞乱的代码。

import random
import pandas as pd

url ='https://www.national-lottery.co.uk/results/euromillions/draw-history/csv'
df = pd.read_csv(url,   sep=',', na_values=".")

lottoNumbers = [1,2,3,4,5,6,7,8,9,10,
           11,12,13,14,15,16,17,18,19,20,
           21,22,23,24,25,26,27,28,29,30,
           31,32,33,34,35,36,37,38,39,40,
           41,42,43,44,45,46,47,48,49,50,
           51,52,53,54,55,56,57,58,59]
luckyDip = random.sample(lottoNumbers, k=6) #Picks 6 numbers at random
print (sorted(luckyDip))    
print  (df[df[0:].isin(luckyDip)])

3 个答案:

答案 0 :(得分:1)

不如@ayhan解决方案那么优雅,但这可行:

import random
import pandas as pd

url ='https://www.national-lottery.co.uk/results/euromillions/draw-history/csv'
df = pd.read_csv(url, index_col=0,  sep=',')

lottoNumbers = range(1, 60)

tries = 0
while True:
    tries+=1
    luckyDip = random.sample(lottoNumbers, k=6) #Picks 6 numbers at random

    # subset of balls
    draws = df.iloc[:,0:7]

    # True where there is match
    matches = draws.isin(luckyDip)

    # Gives the sum of Trues
    sum_of_trues = matches.sum(1)

    # you are looking for matches where sum_of_trues is 6
    final = sum_of_trues[sum_of_trues == 6]
    if len(final) > 0:
        print("Took", tries)
        print(final)
        break

结果是这样的:

Took 15545
DrawDate
16-May-2017    6
dtype: int64

答案 1 :(得分:0)

如果您只想展平数组并删除nan值,可以将其添加到代码的末尾:

    matches = df[df[0:].isin(luckyDip)].values.flatten().astype(np.float64)
    print matches[~np.isnan(matches)]

答案 2 :(得分:0)

您可以通过计算每行中的notnull值来添加您拥有的内容。然后显示匹配大于或等于2的行。

match_count = df[df[0:].isin(luckyDip)].notnull().sum(axis=1)
print(match_count[match_count >= 2])

这为您提供了匹配行的索引值和匹配数。

示例输出:

6     2
26    2
40    3
51    2

如果您还想要这些行的匹配值,可以添加:

index = match_count[match_count >= 2].index
matches = [tuple(x[~pd.isnull(x)]) for x in df.loc[index][df[0:].isin(luckyDip)].values]
print(matches)

示例输出:

[(19.0, 23.0), (19.0, 41.0), (19.0, 23.0, 34.0), (23.0, 28.0)]