如何在有条件的熊猫列/系列中找到最接近输入数字的值?

时间:2020-10-28 17:35:24

标签: python pandas pandas-groupby

我有一个看起来像这样的数据框

import psutil     #psutil - https://github.com/giampaolo/psutil
import time
import sys
import datetime

execute_shell_or_command = sys.executable
found_executable = False

executable_to_find = 'DragonManiaLegends.exe'
iteration = 1
then = time.time()

# Get a list of all running processes
while not found_executable:
    list = psutil.pids()

    # Go though list and check each processes executeable name for 'putty.exe'
    for i in range(0, len(list)):
        try:
            p = psutil.Process(list[i])
            if p.cmdline()[0].find(executable_to_find) != -1:
                # DML found. Kill it
                now = time.time()
                print(f"Found {executable_to_find}. Killing execution...")
                p.kill()
                datetime_str_now = datetime.datetime.fromtimestamp(now).strftime('%a %b %d %Y, %I:%M:%S %p')
                datetime_str_then = datetime.datetime.fromtimestamp(then).strftime('%a %b %d %Y, %I:%M:%S %p')
                print(f"On {datetime_str_then}, you have started your quest to  t e r m i n a t e  {executable_to_find}"
                      f"\nOn {datetime_str_now}, you have completed that quest, and it has been terminated."
                      f"\nIt only took you {round(now - then)} seconds to elimate {executable_to_find}!")
                # found_executable = True  # Comment this out to loop FOREVER
                # break
        except:
            pass

    if 'python.exe' in execute_shell_or_command:
        now = time.time()
        datetime_str_now = datetime.datetime.fromtimestamp(now).strftime('%a %b %d %Y, %I:%M:%S %p')
        datetime_str_then = datetime.datetime.fromtimestamp(then).strftime('%a %b %d %Y, %I:%M:%S %p')
        sys.stdout.write(f"\rIt has been {iteration} minute(s) since the past iteration. Since then, some pretty amazing stuff happened, like:\nTIME STARTED: {datetime_str_then}.\nTIME: {datetime_str_now}")
        sys.stdout.flush()
        #  print(f"\rIt has been {iteration} minute(s) since the past iteration. Since then, some pretty amazing stuff happened, like: TIME STARTED: {datetime_str_then}. TIME: {datetime_str_now}", end="", flush=True)
        iteration += 1
    
    time.sleep(60)

对于“ A”列中的每个组,我需要在以下两个附加条件下找到最接近5的值:

如果B列= 1,我选择的行应该是超过5的第一行,即使它不是最接近5的数字

如果C列中最接近5的值是<4,则所选行应该是第一个> 4,即使这意味着它不再是最接近5的值

我得到的数据框应如下所示:

pd.DataFrame({'A': ['C1', 'C1', 'C1', 'C1', 'C2', 'C2', 'C2', 'C2', 'C3', 'C3', 'C3', 'C3'],
  ...:                    'B': [1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0],
  ...:                    'C': [1, 4, 8, 9, 1, 3, 8, 9, 1, 4, 7, 0]})
Out[5]: 
     A  B  C
0   C1  1  1
1   C1  0  4
2   C1  1  8
3   C1  1  9
4   C2  0  1
5   C2  1  3
6   C2  0  8
7   C2  1  9
8   C3  0  1
9   C3  1  4
10  C3  0  7
11  C3  0  0

逻辑

组C1 4中的最接近5,但是超过5的数字为8,并且在B列中的值为1,因此选择了8

组C2中的3最接近5,但是小于4,因此超过4的值是8,因此选择了8

组C3中的4最接近5,它> = 4,并且超过5(7)的数字在B列中的值为0,因此选择了4

什么是解决此问题的好方法?

0 个答案:

没有答案