Question

我正在尝试跳过End_time值为“ Failed”的列行上的操作。

这是我的实际文件。

check_time.log

df = pd.DataFrame([[1,2],[3,4],[5,6],[7,8]],columns=['a','b'], 
              index=pd.to_datetime(['2017-01-01 14:30:00','2017-01-01 14:31:00', 
                                    '2017-01-02 14:30:00', '2017-01-02 14:31:00']))


# create multiindex with level 1 being just dates
df.set_index(df.index.floor('D'), inplace=True, append=True)

# divide df by the group sum matching the index values of level 1
df.div(df.groupby(level=1).sum(), level=1).reset_index(level=1, drop=True)

                          a         b
2017-01-01 14:30:00   0.250000  0.333333
2017-01-01 14:31:00   0.750000  0.666667
2017-01-02 14:30:00   0.416667  0.428571
2017-01-02 14:31:00   0.583333  0.571429

这是我到目前为止所拥有的。

Done  City                               Start_time  End_time
  Yes   Chicago                            18          10
  Yes   Atlanta                            208         11
   No   Minnetonka                          57        Failed
  Yes   Hopkins                           112         80
   No   Marietta                          2018        Failed

输出：

awk 'BEGIN { OFS = "\t" } NR == 1 { $5 = "Time_diff" } NR >= 2 { $5 = $3 - $4 } 1' < files |column -t

所需的输出应如下所示：

  Done  City                               Start_time  End_time  Time_diff
  Yes   Chicago                            18          10        8
  Yes   Atlanta                            208         11        197
   No   Minnetonka                          57        Failed     57
  Yes   Hopkins                           112         80        32
   No   Marietta                          2018        Failed    2018

那我该如何跳过呢？

Answer 1

您应该可以更改：

$5 = $4 - $5

进入：

if ($4 != "Failed") { $5 = $3 - $4 }

这将：

拒绝在结束时间为$5的行中将Failed从空更改为计算值；和
正确计算所有其他行。

我正确地说了，因为在这些情况下，您似乎希望开始时间减去结束时间，尽管事实上持续时间往往是结束时间减去开始时间。我已经对其进行了更改，以匹配您所需的输出，而不是“理智”的期望。

记录在后，以便您可以看到它在运行中：

pax$ awk 'BEGIN{OFS="\t"}NR==1{$5="Time_diff"}NR>=2{if($4!="Failed"){$5=$3-$4}}1' <inputFile.txt |column -t
Done  City        Start_time  End_time  Time_diff
Yes   Chicago     18          10        8
Yes   Atlanta     208         11        197
No    Minnetonka  57          Failed
Yes   Hopkins     112         80        32
No    Marietta    2018        Failed

顺便说一句，您可能想要考虑从New York，San Antonio，Salt Lake City开始获取信息时会发生什么，或者，更糟糕的是，Maccagno con Pino e Veddasca：-）

Answer 2

请尝试以下操作。（此处考虑到Input_file的最后一个字段仅是此顺序，并且没有其他任何其他字段，如果有的话，则可能需要调整字段编号，因为万一您的城市的价值空格，然后从头开始的字段编号会造成一个问题，即仅区分所有行的值，因为字段值将不同于每行）

awk '
FNR==1{
  print $0,"Time_Diff"
  next
}
$NF!="Failed"{
  $(NF+1)=$(NF-1)-$NF
}
1
'  Input_file | column -t

输出如下。

Done  City        Start_time  End_time  Time_Diff
Yes   Chicago     18          10        8
Yes   Atlanta     208         11        197
No    Minnetonka  57          Failed
Yes   Hopkins     112         80        32
No    Marietta    2018        Failed

说明： 现在为上述代码添加完整的说明。

awk '                      ##Starting awk program from here.
FNR==1{                    ##Checking conditoin if line is very first line then do following.
  print $0,"Time_Diff"     ##Printing current line with string Time_Diff here on very first line to print headings.
  next                     ##next is awk keyword which will skip all further statements from here.
}
$NF!="Failed"{             ##Checking if last field $NF where NF is number of fields and $ means in awk field value is NOT failed then do following.
  $(NF+1)=$(NF-1)-$NF      ##Add a new column by doing $(NF+1) whose value will be difference of 2nd last column and last column as per samples.
}                          ##Closing this condition block here.
1                          ##Mentioning 1 will print edited/non-edited line for Input_file.
' Input_file   |           ##Mentioning Input_file name and passing awk program output to next command by using pipe(|).
column -t                  ##Using column -t will print the output in TAB separated format.

Answer 3

如果您正在考虑使用Perl，

> cat kwa.in 
Done  City                               Start_time  End_time
  Yes   Chicago                            18          10
  Yes   Atlanta                            208         11
   No   Minnetonka                          57        Failed
  Yes   Hopkins                           112         80
   No   Marietta                          2018        Failed
> perl -lane ' print join(" ",@F,"Time_Diff") if $.==1; if($.>1 ) { $F[4]=$F[2]-$F[3] if not $F[3]=~/Failed/; print join(" ",@F) } ' kwa.in | column -t
Done  City        Start_time  End_time  Time_Diff
Yes   Chicago     18          10        8
Yes   Atlanta     208         11        197
No    Minnetonka  57          Failed
Yes   Hopkins     112         80        32
No    Marietta    2018        Failed
>

如果其中一列包含字母，则跳过该行的操作-bash

3 个答案: