我希望从目录中的许多csv创建一个单独的csv。我知道这已被多次覆盖,但我有一点点扭曲。我想做的事情:
据说我正在使用以下内容:
我发现这个链接从一个csv到另一个csv。
https://askubuntu.com/questions/553219/add-column-from-one-csv-to-another-csv-file
我可以利用这样的东西将列从一个添加到另一个。
paste -d, file2 <(cut -d, -f3- file1)
以下PHP将获取目录的文件列表,该目录现在尝试利用PHP来组合/合并csvs。
$dir= $Folder.'/Stats/Latency/'; // directory name
$ar=scandir($dir);
$box=$_POST['box']; // Receive the file list from form
// Looping through the list of selected files ///
while (list ($key,$val) = @each ($box)) {
$path=$dir ."/".$val;
$dest = $Folder."/Report/Latency/".$val;
if(copy($path, $dest)); //echo "Copy Complete file ";
echo "$val,";
}
echo "<hr>";
这是我需要下面的CSV合并的地方: 我正在讨论使用shell exec命令,但这似乎非常耗费人力。
$reportFiles = $Folder."/Report/Latency/";
foreach(glob($reportFiles."*.csv") as $file)
{
shell_exec("touch "$reportFiles."latencyReport.csv");
}
因为它与csv文件中的数据有关:
CSV1:
date,vpool06
2016-03-28 12:00:00,0.000
2016-03-28 12:01:00,0.000
2016-03-28 12:02:00,0.000
2016-03-28 12:03:00,0.000
2016-03-28 12:04:00,0.000
2016-03-28 12:05:00,0.000
2016-03-28 12:06:00,0.000
2016-03-28 12:07:00,0.000
2016-03-28 12:08:00,0.000
2016-03-28 12:09:00,0.000
2016-03-28 12:10:00,0.000
2016-03-28 12:11:00,0.000
2016-03-28 12:12:00,0.000
2016-03-28 12:13:00,0.000
2016-03-28 12:14:00,0.000
2016-03-28 12:15:00,0.000
2016-03-28 12:16:00,0.000
2016-03-28 12:17:00,0.000
2016-03-28 12:18:00,0.000
2016-03-28 12:19:00,0.000
CSV2:
date,vpool02
2016-03-28 12:00:00,0.000
2016-03-28 12:01:00,0.000
2016-03-28 12:02:00,0.000
2016-03-28 12:04:00,0.000
2016-03-28 12:05:00,0.000
2016-03-28 12:06:00,0.000
2016-03-28 12:07:00,0.000
2016-03-28 12:08:00,0.000
2016-03-28 12:09:00,0.000
2016-03-28 12:10:00,0.000
2016-03-28 12:11:00,0.000
2016-03-28 12:12:00,0.000
2016-03-28 12:13:00,0.000
2016-03-28 12:14:00,0.000
CSV3:
date,vpool03
2016-03-28 12:00:00,0.000
2016-03-28 12:01:00,0.000
2016-03-28 12:02:00,0.000
2016-03-28 12:04:00,0.000
2016-03-28 12:05:00,0.000
合并CSV:
date,vpool06,vpool02,vpool03
2016-03-28 12:00:00,0.000,0.000,0.000
2016-03-28 12:01:00,0.000,0.000,0.000
2016-03-28 12:02:00,0.000,0.000,0.000
2016-03-28 12:03:00,0.000,,0.000
2016-03-28 12:04:00,0.000,0.000,0.000
2016-03-28 12:05:00,0.000,0.000,0.000
2016-03-28 12:06:00,0.000,0.000,
2016-03-28 12:07:00,0.000,0.000,
2016-03-28 12:08:00,0.000,0.000,
2016-03-28 12:09:00,0.000,0.000,
2016-03-28 12:10:00,0.000,0.000,
2016-03-28 12:11:00,0.000,0.000,
2016-03-28 12:12:00,0.000,0.000,
2016-03-28 12:13:00,0.000,0.000,
2016-03-28 12:14:00,0.000,0.000,
2016-03-28 12:15:00,0.000,,
2016-03-28 12:16:00,0.000,,
2016-03-28 12:17:00,0.000,,
2016-03-28 12:18:00,0.000,,
2016-03-28 12:19:00,0.000,,
理想情况下,我不关心此时是否存在“null”值,因为它不会显示在图表中。这意味着当时服务器已关闭。
需要在没有数据的空格中使用null 更新:示例。
date,vpool06,7NA_01,7NA_02,bd01,bd02,vpool01,vpool02,vpool03,vpool04,vpool07
2016-03-28 12:00:00,1.000,null,10.00,02.00,20.00,0.00,0.00,0.00,0.00,0.000
2016-03-28 12:01:00,0.000,11.00,110.00,null,11.00,0.00,0.00,0.00,0.00,0.000
2016-03-28 12:02:00,0.000,null,0.00,2.00,100,0.00,0.00,0.00,0.00,0.000
2016-03-28 12:03:00,0.000,0.00,0.00,02.00,10.00,0.00,0.000,0.00,0.00,0.000
答案 0 :(得分:1)
awk
救援!
$ awk -F, -v OFS=, 'FNR==1{c++} {a[$1,c]=$2;keys[$1]}
END{for(k in keys)
{printf "%s", k;
for(i=1;i<=c;i++)
printf "%s", OFS (((k,i) in a)?a[k,i]:"");
print ""}}' file{1,2,3} |
sort -t, -k1,1 |
tee >(sed '$d' > merged) >(tail -1 >> merged)
$ cat merged
date,vpool06,vpool02,vpool03
2016-03-28 12:00:00,0.000,0.000,0.000
2016-03-28 12:01:00,0.000,0.000,0.000
2016-03-28 12:02:00,0.000,0.000,0.000
2016-03-28 12:03:00,0.000,,
2016-03-28 12:04:00,0.000,0.000,0.000
2016-03-28 12:05:00,0.000,0.000,0.000
2016-03-28 12:06:00,0.000,0.000,
2016-03-28 12:07:00,0.000,0.000,
2016-03-28 12:08:00,0.000,0.000,
2016-03-28 12:09:00,0.000,0.000,
2016-03-28 12:10:00,0.000,0.000,
2016-03-28 12:11:00,0.000,0.000,
2016-03-28 12:12:00,0.000,0.000,
2016-03-28 12:13:00,0.000,0.000,
2016-03-28 12:14:00,0.000,0.000,
2016-03-28 12:15:00,0.000,,
2016-03-28 12:16:00,0.000,,
2016-03-28 12:17:00,0.000,,
2016-03-28 12:18:00,0.000,,
2016-03-28 12:19:00,0.000,,
答案 1 :(得分:1)
我不知道你是如何在PHP中做到的,但是对于真正的2D数组使用GNU awk并且在“in”中排序它会是:
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 { hdr[ARGIND][1]=$1; hdr[ARGIND][2]=$2; next }
{ arr[ARGIND][$1] = $2 }
END {
for (idx in arr) {
numRows = length(arr[idx])
if (numRows > maxRows) {
maxRows = numRows
maxIdx = idx
}
}
printf "%s%s%s", hdr[maxIdx][1], OFS, hdr[maxIdx][2]
for (idx=1; idx<=ARGIND; idx++) {
if (idx != maxIdx) {
printf "%s%s", OFS, hdr[idx][2]
}
}
print ""
PROCINFO["sorted_in"] = "@ind_str_asc"
for (tstamp in arr[maxIdx]) {
printf "%s%s%s", tstamp, OFS, arr[maxIdx][tstamp]
for (idx=1; idx<=ARGIND; idx++) {
if (idx != maxIdx) {
printf "%s%s", OFS, (tstamp in arr[idx] ? arr[idx][tstamp] : "null")
}
}
print ""
}
}
$ awk -f tst.awk csv3 csv2 csv1
date,vpool06,vpool03,vpool02
2016-03-28 12:00:00,0.000,0.000,0.000
2016-03-28 12:01:00,0.000,0.000,0.000
2016-03-28 12:02:00,0.000,0.000,0.000
2016-03-28 12:03:00,0.000,null,null
2016-03-28 12:04:00,0.000,0.000,0.000
2016-03-28 12:05:00,0.000,0.000,0.000
2016-03-28 12:06:00,0.000,null,0.000
2016-03-28 12:07:00,0.000,null,0.000
2016-03-28 12:08:00,0.000,null,0.000
2016-03-28 12:09:00,0.000,null,0.000
2016-03-28 12:10:00,0.000,null,0.000
2016-03-28 12:11:00,0.000,null,0.000
2016-03-28 12:12:00,0.000,null,0.000
2016-03-28 12:13:00,0.000,null,0.000
2016-03-28 12:14:00,0.000,null,0.000
2016-03-28 12:15:00,0.000,null,null
2016-03-28 12:16:00,0.000,null,null
2016-03-28 12:17:00,0.000,null,null
2016-03-28 12:18:00,0.000,null,null
2016-03-28 12:19:00,0.000,null,null