我有一个数据框,其中包含几个日期时间值列和一些其他分类/连续列。 为了便于描述,我上传了数据帧的片段,还删除了实际的日期值以避免混乱。
我正在尝试创建一个列,在确定要在此新列中填充的内容之前,必须处理数据框中的行以匹配标准。
在这种情况下:
如果行的SECTOR AND BASE值与某些其他行中的相同值匹配 和 如果此/这些前面的END日期(具有SECTOR AND BASE等效于现在具有相同SECTOR AND BASE的行的行)行匹配数据帧中稍后阶段的行的START日期,然后用1填充,否则为0。 所以,基本上,我正在看这样的事情:
BASE SECTOR START END CHECK
S DHHJJ 12/2/2018 13/3/2018 0
B DJH 12/3/2018 13/3/2018 0
S FHJDFJK 12/4/2018 13/3/2020 0
B FHJDG 12/5/2018 13/3/2021 0
T XYZ 23/03/2018 25/03/2018 1
T ABCD 12/1/2017 13/2/2017 0
T ABCD 1/2/2018 1/3/2018 1
T ABCD 1/3/2018 15/3/2018 1
T XYZ 12/1/2015 12/2/2015 0
B XYZ 15/5/2017 15/7/2017 1
T XYZ 12/2/2014 12/3/2014 0
B XYZ 15/7/2017 20/7/2017 0
T SFJUTEUI 12/2/2018 13/3/2018 0
T RUTI 12/3/2018 13/3/2019 0
T FDJTK 12/4/2018 13/3/2020 0
B FJURTUI 12/5/2018 13/3/2021 0
T RYURTI 12/6/2018 13/3/2022 0
T SFJUI 12/7/2018 13/3/2023 0
T XYZ 25/03/2018 30/03/2018 0
T XYZ 12/4/2018 12/4/2018 0
T XYZ 1/4/2016 1/5/2016 1
T XYZ 1/5/2016 5/5/2016 0
T ABCD 15/3/2018 31/3/2018 0
使用BASE条件的独家修正添加数据:
BASE SECTOR START END CHECK
S DHHJJ 12/2/2018 13/3/2018 0
B DJH 12/3/2018 13/3/2018 0
S FHJDFJK 12/4/2018 13/3/2020 0
B FHJDG 12/5/2018 13/3/2021 0
T XYZ 23/03/2018 25/03/2018 1
T ABCD 12/1/2017 13/2/2017 0
B ABCD 1/2/2018 1/3/2018 1
T ABCD 1/3/2018 15/3/2018 1
T XYZ 12/1/2015 12/2/2015 0
B XYZ 15/5/2017 15/7/2017 1
T XYZ 12/2/2014 12/3/2014 0
T XYZ 15/7/2017 20/7/2017 0
T SFJUTEUI 12/2/2018 13/3/2018 0
T RUTI 12/3/2018 13/3/2019 0
T FDJTK 12/4/2018 13/3/2020 0
B FJURTUI 12/5/2018 13/3/2021 0
T RYURTI 12/6/2018 13/3/2022 0
T SFJUI 12/7/2018 13/3/2023 0
T XYZ 25/03/2018 30/03/2018 0
T XYZ 12/4/2018 12/4/2018 0
T XYZ 1/4/2016 1/5/2016 1
B XYZ 1/5/2016 5/5/2016 0
B ABCD 15/3/2018 31/3/2018 0
答案 0 :(得分:1)
将groupby
的自定义功能用于检查成员身份,并排除具有相同START
和END
日期的行。对于0, 1
值,将布尔值转换为整数。
df[['START','END']] = df[['START','END']].apply(pd.to_datetime)
def f(x):
#test all start datetimes, order is not important
x['Check1'] = (x['END'].isin(x['START']) & (x['END'] != x['START'])).astype(int)
return x
df = df.groupby(['BASE','SECTOR']).apply(f)
print (df)
BASE SECTOR START END CHECK Check1
0 S DHHJJ 2018-12-02 2018-03-13 0 0
1 B DJH 2018-12-03 2018-03-13 0 0
2 S FHJDFJK 2018-12-04 2020-03-13 0 0
3 B FHJDG 2018-12-05 2021-03-13 0 0
4 T XYZ 2018-03-23 2018-03-25 1 1
5 T ABCD 2017-12-01 2017-02-13 0 0
6 T ABCD 2018-01-02 2018-01-03 1 1
7 T ABCD 2018-01-03 2018-03-15 1 1
8 T XYZ 2015-12-01 2015-12-02 0 0
9 B XYZ 2017-05-15 2017-07-15 1 1
10 T XYZ 2014-12-02 2014-12-03 0 0
11 B XYZ 2017-07-15 2017-07-20 0 0
12 T SFJUTEUI 2018-12-02 2018-03-13 0 0
13 T RUTI 2018-12-03 2019-03-13 0 0
14 T FDJTK 2018-12-04 2020-03-13 0 0
15 B FJURTUI 2018-12-05 2021-03-13 0 0
16 T RYURTI 2018-12-06 2022-03-13 0 0
17 T SFJUI 2018-12-07 2023-03-13 0 0
18 T XYZ 2018-03-25 2018-03-30 0 0
19 T XYZ 2018-12-04 2018-12-04 0 0
20 T XYZ 2016-01-04 2016-01-05 1 1
21 T XYZ 2016-01-05 2016-05-05 0 0
22 T ABCD 2018-03-15 2018-03-31 0 0
如果日期时间的排序对于支票会员资格很重要:
def f1(x):
e = x['END']
s = x['START']
#for each start datetime test all next end datetimes
m = {j[0]: (s.iloc[i+1:] == j[1]).any() for i,j in enumerate(e.items())}
x['Check2'] = pd.Series(m).astype(int)
return x
df = df.groupby(['BASE','SECTOR']).apply(f1)
print (df)
为了更好地看到差异,一个值发生了变化:
print (df.tail())
BASE SECTOR START END CHECK
18 T XYZ 25/03/2018 30/03/2018 0
19 T XYZ 5/5/2016 12/4/2018 0 <-changed value to 5/5/2016
20 T XYZ 1/4/2016 1/5/2016 1
21 T XYZ 1/5/2016 5/5/2016 0
22 T ABCD 15/3/2018 31/3/2018 0
df = df.groupby(['BASE','SECTOR']).apply(f)
df = df.groupby(['BASE','SECTOR']).apply(f1)
print (df.tail())
BASE SECTOR START END CHECK Check1 Check2
18 T XYZ 2018-03-25 2018-03-30 0 0 0
19 T XYZ 2016-05-05 2018-12-04 0 0 0
20 T XYZ 2016-01-04 2016-01-05 1 1 1
21 T XYZ 2016-01-05 2016-05-05 0 1 0
22 T ABCD 2018-03-15 2018-03-31 0 0 0
答案 1 :(得分:1)
谢谢@Jezrael, 总结一下:这是解决方案:
import subprocess
import sys
add: str = sys.argv[1]
commit: str = sys.argv[2]
branch: str = sys.argv[3]
def run_command(command: str):
print(command)
process = subprocess.Popen(command.split(), stdout=subprocess.PIPE)
print(str(process.args))
if command.startswith("git push"):
output, error = process.communicate()
else:
output, error = process.communicate()
try:
output = bytes(output).decode()
error = bytes(error).decode()
if not output:
print("output: " + output)
print("error: " + error)
except TypeError:
print()
def main():
global add
global commit
global branch
if add == "" or add == " ":
add = "."
if branch == "":
branch = "master"
print("add: '" + add + "' commit: '" + commit + "' branch: '" + branch + "'")
command = "git add " + add
run_command(command)
commit = commit.replace(" ", "''")
command = 'git commit -m "' + commit + '"'
run_command(command)
command = "git push origin " + branch
run_command(command)
if __name__ == '__main__':
main()