tcl check in Duplicates文件

时间:2017-01-02 18:41:45

标签: bash sorting tcl

我在file.txt这和2个Varibles

var as example 
$song RIDE
$artist TWENTY_ONE_PILOTS

file.txt的

1483379340 02.01.2017 18:49:00 GURU_JOSH_PROJECT INFINITY_08
1483379370 02.01.2017 18:49:30 LADY_GAGA MILLION_REASONS
1483379440 02.01.2017 18:50:40 GURU_JOSH_PROJECT INFINITY_08
1483379565 02.01.2017 18:52:45 GURU_JOSH_PROJECT INFINITY_08
1483379645 02.01.2017 18:54:05 POLO_HOFER ALPEROSE
1483380245 02.01.2017 19:04:05 WINCENT_WEISS MUSIK_SEIN
1483380485 02.01.2017 19:08:05 MR_PROBZ WAVES
1483380625 02.01.2017 19:10:25 ZARA_LARSSON LUSH_LIFE
1483380695 02.01.2017 19:11:35 MR_PROBZ WAVES
1483380725 02.01.2017 19:12:05 ZARA_LARSSON LUSH_LIFE
1483380765 02.01.2017 19:12:45 ARIANA_GRANDE SIDE_TO_SIDE
1483380835 02.01.2017 19:13:55 ZARA_LARSSON LUSH_LIFE
1483380975 02.01.2017 19:16:15 TWENTY_ONE_PILOTS RIDE
1483381216 02.01.2017 19:20:16 TAYLOR_SWIFT SHAKE_IT_OFF

我希望在08:00到17:00之间的这一天和相隔至少5分钟(重复/错误的记录),检查重复。它的奔跑歌曲

在bash中有这个测试,如何在Tcl中设置?     sort file.txt | grep' 02.01.2017 08:\ | 09:\ | 10:\ | 11:\ | 12:\ | 13:\ | 14:\ | 15:\ | 16:' | cut -d" " -f4 | uniq -cd

但它不会那样工作。我需要一个新的想法:)

proc check { nick uhost handle channel text } {
    set artist TWENTY_ONE_PILOTS
    set song RIDE
    set file [exec sort file.txt | grep '02.01.2017 08:\| 09:\| 10:\| 11:\| 12:\| 13:\| 14:\| 15:\| 16:' | cut -d " " -f4 | uniq -cd]
    putnow "PRIVMSG $channel :duplicates $artist $song";        
}

2 个答案:

答案 0 :(得分:1)

首先,Tcl使用{ ... }代替' ... '。这可能会解决问题。

set file [exec sort file.txt | grep {02.01.2017 08:\| 09:\| 10:\| 11:\| 12:\| 13:\| 14:\| 15:\| 16:} | cut -d " " -f4 | uniq -cd]

但是,如果我想找到非唯一值,我会直接在Tcl中进行处理:

set f [open file.txt]
set lines [split [read $f] "\n"]
close $f

foreach line $lines {
    lassign [split $line] id day time artist song
    lappend info($artist,$song) $line
}

foreach {key matches} [array get info] {
    if {[llength $matches] > 1} {
        # Now have a list of duplicates; the oldest might be first if file.txt is so sorted

        # Write some reporting code here
    }
}

答案 1 :(得分:0)

我要么错过了间隔应该≥5分钟(我认为是<5分钟),或者问题已经改变了。此代码检查同一首歌以≥5分钟的间隔播放时的事件,并打印出这些事件。给定数据中的所有行都不符合这些条件,但如果我添加一些歌曲,它似乎可以正常工作。

proc enumerate times {
    set times [dict values $times]
    if {[llength $times] == 2} {
        return "[lindex $times 0] and [lindex $times 1]"
    } else {
        return "[join [lrange $times 0 end-1] ", "], and [lindex $times end]"
    }
}

proc input name {
    set f [open $name]
    set data [read $f]
    close $f
    string trim $data
}

proc checkHours {time early late} {
    if {[scan $time %d hour] != 1} {
        error "bad time value?"
    }
    if {$hour < $early || $hour >= $late} {
        return -code continue
    }
}

proc main {} {
    set items {}
    foreach line [split [input file.txt] \n] {
        lassign $line seconds - time - song

        checkHours $time 8 17

        if {[dict exists $items $song]} {
            dict for {secs -} [dict get $items $song] {
                if {$seconds - $secs >= 300} {
                    dict set items $song $seconds $time
                }
            }
        } else {
            dict set items $song $seconds $time
        }
    }
    dict for {song times} $items {
        if {[llength $times] > 2} {
            puts "$song played ≥five minutes apart at [enumerate $times]"
        }
    }
}

main

enumerate命令仅适用于漂亮时间。

文档: - (operator)== (operator)> (operator)>= (operator)closedictforeachifjoinlassignlindexllengthlrangeopenprocputsreadreturnscansetsplitstring