我很欣赏有关如何利用Clojure有效地解析和比较两个文件的建议/见解。有两个(日志)文件包含员工出勤率;从这些文件中,我需要确定两名员工在同一部门中工作同一时间的所有日子。以下是日志文件的示例。
注意:每个文件都有不同数量的条目。
第一档:
Employee Id Name Time In Time Out Dept.
mce0518 Jon 2011-01-01 06:00 2011-01-01 14:00 ER
mce0518 Jon 2011-01-02 06:00 2011-01-01 14:00 ER
mce0518 Jon 2011-01-04 06:00 2011-01-01 13:00 ICU
mce0518 Jon 2011-01-05 06:00 2011-01-01 13:00 ICU
mce0518 Jon 2011-01-05 17:00 2011-01-01 23:00 ER
第二档:
Employee Id Name Time In Time Out Dept.
pdm1705 Jane 2011-01-01 06:00 2011-01-01 14:00 ER
pdm1705 Jane 2011-01-02 06:00 2011-01-01 14:00 ER
pdm1705 Jane 2011-01-05 06:00 2011-01-01 13:00 ER
pdm1705 Jane 2011-01-05 17:00 2011-01-01 23:00 ER
答案 0 :(得分:3)
如果你不打算定期这样做,
(defn data-seq [f]
(with-open [rdr (java.io.BufferedReader.
(java.io.FileReader. f))]
(let [s (rest (line-seq rdr))]
(doall (map seq (map #(.split % "\\s+") s))))))
(defn same-time? [a b]
(let [a (drop 2 a)
b (drop 2 b)]
(= a b)))
(let [f1 (data-seq "f1.txt")
f2 (data-seq "f2.txt")]
(reduce (fn[h v]
(let [f2 (filter #(same-time? v %) f2)]
(if (empty? f2)
h
(conj h [(first v) (map first f2)])))) [] f1)
)
会找到你,
[["mce0518" ("pdm1705")] ["mce0518" ("pdm1705")] ["mce0518" ("pdm1705")]]
答案 1 :(得分:1)
我来的时间更短,而且(恕我直言)更易读的版本
(use ; moar toolz - moar fun
'[clojure.contrib.duck-streams :only (reader)]
'[clojure.string :only (split)]
'[clojure.contrib.str-utils :only (str-join)]
'[clojure.set :only (intersection)])
(defn read-presence [filename]
(with-open [rdr (reader filename)] ; file will be securely (always) closed after use
(apply hash-set ; make employee's hash-set
(map #(str-join "--" (drop 2 (split % #" [ ]+"))) ; right-to-left: split row by spaces then forget two first columns then join using "--"
(drop 1 ; ommit first line
(line-seq rdr)))))) ; read file content line-by-line
(intersection (read-presence "a.in") (read-presence "b.in")) ; now it's simple!
;result: #{"2011-01-01 06:00--2011-01-01 14:00--ER" "2011-01-02 06:00--2011-01-01 14:00--ER" "2011-01-05 17:00--2011-01-01 23:00--ER"}
假设a.in
和b.in
是您的文件。我还假设你每个员工都有一个哈希集 - (天真)对N名员工的概括需要接下来的六行:
(def employees ["greg.txt" "allison.txt" "robert.txt" "eric.txt" "james.txt" "lisa.txt"])
(for [a employees b employees :when (and
(= a (first (sort [a b]))) ; thou shall compare greg with james ONCE
(not (= a b)))] ; thou shall not compare greg with greg
(str-join " -- " ; well, it's not pretty... nor pink at least
[a b (intersection (read-presence a) (read-presence b))]))
;result: ("a.in -- b.in -- #{\"2011-01-01 06:00--2011-01-01 14:00--ER\" \"2011-01-02 06:00--2011-01-01 14:00--ER\" \"2011-01-05 17:00--2011-01-01 23:00--ER\"}")
实际上这个循环太丑了,它不会记住中间结果......需要改进。
- 编辑 -
我知道核心或贡献必须有优雅的东西!
(use '[clojure.contrib.combinatorics :only (combinations)])
(def employees ["greg.txt" "allison.txt" "robert.txt" "eric.txt" "james.txt" "lisa.txt"])
(def employee-map (apply conj (for [e employees] {e (read-presence e)})))
(map (fn [[a b]] [a b (intersection (employee-map a) (employee-map b))])
(combinations employees 2))
;result: (["a.in" "b.in" #{"2011-01-01 06:00--2011-01-01 14:00--ER" "2011-01-02 06:00--2011-01-01 14:00--ER" "2011-01-05 17:00--2011-01-01 23:00--ER"}])
现在它被记忆了(在employee-map中解析了数据),一般而且......懒惰:D