我正在尝试跳过End_time值为“ Failed”的列行上的操作。
这是我的实际文件。
check_time.log
df = pd.DataFrame([[1,2],[3,4],[5,6],[7,8]],columns=['a','b'],
index=pd.to_datetime(['2017-01-01 14:30:00','2017-01-01 14:31:00',
'2017-01-02 14:30:00', '2017-01-02 14:31:00']))
# create multiindex with level 1 being just dates
df.set_index(df.index.floor('D'), inplace=True, append=True)
# divide df by the group sum matching the index values of level 1
df.div(df.groupby(level=1).sum(), level=1).reset_index(level=1, drop=True)
a b
2017-01-01 14:30:00 0.250000 0.333333
2017-01-01 14:31:00 0.750000 0.666667
2017-01-02 14:30:00 0.416667 0.428571
2017-01-02 14:31:00 0.583333 0.571429
这是我到目前为止所拥有的。
Done City Start_time End_time
Yes Chicago 18 10
Yes Atlanta 208 11
No Minnetonka 57 Failed
Yes Hopkins 112 80
No Marietta 2018 Failed
输出:
awk 'BEGIN { OFS = "\t" } NR == 1 { $5 = "Time_diff" } NR >= 2 { $5 = $3 - $4 } 1' < files |column -t
所需的输出应如下所示:
Done City Start_time End_time Time_diff
Yes Chicago 18 10 8
Yes Atlanta 208 11 197
No Minnetonka 57 Failed 57
Yes Hopkins 112 80 32
No Marietta 2018 Failed 2018
那我该如何跳过呢?
答案 0 :(得分:3)
您应该可以更改:
$5 = $4 - $5
进入:
if ($4 != "Failed") { $5 = $3 - $4 }
这将:
$5
的行中将Failed
从空更改为计算值;和我正确地说了 ,因为在这些情况下,您似乎希望开始时间减去结束时间,尽管事实上持续时间往往是结束时间减去开始时间。我已经对其进行了更改,以匹配您所需的输出,而不是“理智”的期望。
记录在后,以便您可以看到它在运行中:
pax$ awk 'BEGIN{OFS="\t"}NR==1{$5="Time_diff"}NR>=2{if($4!="Failed"){$5=$3-$4}}1' <inputFile.txt |column -t
Done City Start_time End_time Time_diff
Yes Chicago 18 10 8
Yes Atlanta 208 11 197
No Minnetonka 57 Failed
Yes Hopkins 112 80 32
No Marietta 2018 Failed
顺便说一句,您可能想要考虑从New York
,San Antonio
,Salt Lake City
开始获取信息时会发生什么,或者,更糟糕的是,Maccagno con Pino e Veddasca
:-)
答案 1 :(得分:2)
请尝试以下操作。(此处考虑到Input_file的最后一个字段仅是此顺序,并且没有其他任何其他字段,如果有的话,则可能需要调整字段编号,因为万一您的城市的价值空格,然后从头开始的字段编号会造成一个问题,即仅区分所有行的值,因为字段值将不同于每行)
awk '
FNR==1{
print $0,"Time_Diff"
next
}
$NF!="Failed"{
$(NF+1)=$(NF-1)-$NF
}
1
' Input_file | column -t
输出如下。
Done City Start_time End_time Time_Diff
Yes Chicago 18 10 8
Yes Atlanta 208 11 197
No Minnetonka 57 Failed
Yes Hopkins 112 80 32
No Marietta 2018 Failed
说明: 现在为上述代码添加完整的说明。
awk ' ##Starting awk program from here.
FNR==1{ ##Checking conditoin if line is very first line then do following.
print $0,"Time_Diff" ##Printing current line with string Time_Diff here on very first line to print headings.
next ##next is awk keyword which will skip all further statements from here.
}
$NF!="Failed"{ ##Checking if last field $NF where NF is number of fields and $ means in awk field value is NOT failed then do following.
$(NF+1)=$(NF-1)-$NF ##Add a new column by doing $(NF+1) whose value will be difference of 2nd last column and last column as per samples.
} ##Closing this condition block here.
1 ##Mentioning 1 will print edited/non-edited line for Input_file.
' Input_file | ##Mentioning Input_file name and passing awk program output to next command by using pipe(|).
column -t ##Using column -t will print the output in TAB separated format.
答案 2 :(得分:0)
如果您正在考虑使用Perl,
> cat kwa.in
Done City Start_time End_time
Yes Chicago 18 10
Yes Atlanta 208 11
No Minnetonka 57 Failed
Yes Hopkins 112 80
No Marietta 2018 Failed
> perl -lane ' print join(" ",@F,"Time_Diff") if $.==1; if($.>1 ) { $F[4]=$F[2]-$F[3] if not $F[3]=~/Failed/; print join(" ",@F) } ' kwa.in | column -t
Done City Start_time End_time Time_Diff
Yes Chicago 18 10 8
Yes Atlanta 208 11 197
No Minnetonka 57 Failed
Yes Hopkins 112 80 32
No Marietta 2018 Failed
>