Question

我正在开展一个编码项目，以确定水是否受到污染。对于一种类型的污染，如果5年窗口中超过10％的样本超出给定标准，则认为水受到污染。为了解决这个问题，我制作了以下代码

def testLocationForConv(overDict):  
    impairedList=[]
    for pollutant in overDict:
            for date in dateList:
            total=0
            over=0
            for compDate in dateList:
                if int(date[0])+1825>int(compDate[0]) and int(date[0])-1825<int(compDate[0]):
                    total=total+1
                    if  date[1]:
                        over=over+1

            if total!=0:
                if over/total>=.1:
                    if pollutant not in impairedList:
                        impairedList.append(pollutant)
    return impairedList

该代码采用字典，并将生成水体污染物列表。字典的键是带有污染物名称的字符串，值是dateList，一个元组列表，测试日期作为第一项，第二个是布尔值，表示当天测量的值是否为超过或低于可接受的值

以下是代码将作为输入的“overDict”示例：

{'Escherichia coli'：[（'40283'，False），（'40317'，False），（'40350'，False），（''40374'，False），（'40408'，True），（'40437'，True），（'40465'，False），（''40505'，False），（''40521'，False），（''40569'，False），（''40597'，False），（' 40619'，False），（'40647'，False），（'40401'，False），（'40710'，False），（'40738'，False），（'40772'，False），（'40801' ，True），（'40822'，False），（''409''，False），（''41011'，False），（''41045'，False），（''41067'，False），（'41228'，False ），（'41388'，False），（'41409'，False），（'41438'，False），（'41466'，False），（''41557'，False），（'41592'，False），（'41710'，False），（'41743'，False），（''41773'，False），（''41802'，False），（''41834'，False）]}

对于这个例子，代码说这是一个例外，但它不应该是，因为不到10％的测试是“真实的”，所有测试都是在5年的时间内完成的。这里有什么不对？

更新：当我使用这个字典作为overDict时，代码认为这个数据不是一个例外，即使在开始40745的窗口中，11个值中的2个超出限制

{'copper': [('38834', False), ('38867', False), ('38897', False),
('40745', False), ('40764', False), ('40799', False), ('41024', True),
('41047', False), ('41072', True), ('41200', False), ('41411', False),
('41442', False), ('41477', False), ('41502', False)]}

要进行故障排除，我在“for tuple”和“for window”行代码下打印了sliding_windows，我得到了这个而不是列表，其中每个不同的开始日期都使用一次。

[[38834, 0, 1]]
[[38834, 0, 1]]
[[38834, 0, 1]]
[[38834, 0, 1]]
[[38834, 0, 1]]
[[38834, 0, 1]]
[[38834, 0, 1]]

Answer 1

这种逻辑能做你想做的吗？

def give5yrSlice(your_list, your_date):
    return [(dat, val) for dat, val in your_list if your_date - 1825 < int(dat) < your_date + 1825]


def testAllSingle5yrFrame(your_list):
    five_years = [year1, year2, year3, year4, year5]

    return all(testSingleSampleSet(give5yrSlice(your_list, d)) for d in five_years)


def testSingleSampleSet(your_list):
    all_passed_values = [passed for date, passed in your_list if passed] 

    return len(all_passed_values) / float(len(your_list)) > 0.1


def testLocationForConv(overDict):  
    return all(testAllSingle5yrFrame(your_list) for your_list in overDict.values())

您致电testLocationForConv(your_dict_with_data)。

Answer 2

results = {}
range = 1825
for name, value in overDict.items():
    sliding_windows = []
    good = True
    for tuple in value:
        # Add this take information to any windows it falls into
        for window in sliding_windows:
            if window[0] > int(tuple[0]) - range:
                window[1] += tuple[1]
                window[2] += 1
        # start a new window with this date
        sliding_windows.append([int(tuple[0]), tuple[1], 1])
    for window in sliding_windows:
        if window[1]/float(window[2]) > .1:
            good = False
    results[name] = good

这会生成一个开始日期sliding_windows的列表：

[[40283, 3, 35], [40317, 3, 34], [40350, 3, 33], [40374, 3, 32], 
 [40408, 3, 31], [40437, 2, 30], [40465, 1, 29], [40505, 1, 28], 
 [40521, 1, 27], [40569, 1, 26], [40597, 1, 25], [40619, 1, 24], 
 [40647, 1, 23], [40681, 1, 22], [40710, 1, 21], [40738, 1, 20], 
 [40772, 1, 19], [40801, 1, 18], [40822, 0, 17], [40980, 0, 16], 
 [41011, 0, 15], [41045, 0, 14], [41067, 0, 13], [41228, 0, 12], 
 [41388, 0, 11], [41409, 0, 10], [41438, 0, 9], [41466, 0, 8], 
 [41557, 0, 7], [41592, 0, 6], [41710, 0, 5], [41743, 0, 4], 
 [41773, 0, 3], [41802, 0, 2], [41834, False, 1]]

并计算每个窗口速率，如果它低于/高于字典，则在字典中返回True / False。不包括没有足够时间的窗口可能是值得的，因为在这种情况下，最后10次测量中的任何命中将被视为失败。我可能会通过最后一次测量并丢弃所有短于5年的窗口（除了可能是第一个，所以如果有5年的数据可以得到部分结果），这样做：

cutoff = int(value[-1][0]) - range
for tuple in value:
    ...
    if int(tuple[0]) < cutoff or len(sliding_windows) == 0:
        sliding_windows.append([int(tuple[0]), tuple[1], 1])

然后生成：

sliding_windows：

[[40283, 3, 35]]

注意，如果好，则返回True，如果错误则返回False：

{'Escherichia coli': True}

注意：这是将布尔True / False隐式转换为1 / 0，将它们加在一起window[1] += tuple[1]。这就是为什么最后一个条目是[41834, False, 1]，相当于我们目的的[41834, 0, 1]。

使用计数器

2 个答案: