清理不同格式的CSV格式的电话号码

时间:2015-05-19 21:53:37

标签: csv awk sed ksh

假设有几种不同类型的CSV格式的电话号码如下:

以下是第一个包含以下行的CSV文件:

"Name","Address","Areacode","Phone"
"Mike Wise","101 Abc Drive","406","123-4567" // Need to remove the dash in the seven-digit phone number

这是另一个包含以下行的CSV文件:

"Name","Address","NPA","TELNO"
"Mike Wise","101 Abc Drive","406","1234567"

是否有某种sed one-liner使其成为以下通用格式?

var express = require('express');
var bodyParser = require('body-parser');
var app = express();
app.use(bodyParser());
app.get('/', function(req, res){
  var html = '<form action="/" method="post">' +
               'Enter your name:' +
               '<input type="text" name="number1" >' +
               '<br>' +
               '<input type="text" name="number2" >' +
               '<br>' +
               '<button type="submit">Submit</button>' +
            '</form>';

  res.send(html);
});

app.post('/', function(req, res){
  var number1 = req.body.number1;
  var number2 = req.body.number2;
  var result = eval(number1+'+'+number2)
  var html = result + '.<br>' +
             '<a href="/">Try again.</a>';
  res.send(html);
});

app.listen(80);

我更喜欢sed one-liner,但如果它必须超过一行,那就这样吧。此外,不需要sed。只是感觉sed可能更容易,但我还没有想出一个sed解决方案。

2 个答案:

答案 0 :(得分:3)

$ cat tst.awk
BEGIN { FS=OFS="\",\"" }
{
    if (NR==1) {
        $3 = "NPA"
        $4 = "TELNO\""
    }
    else {
        gsub(/-/,"",$NF)
        if (NF==3) {
            sub(/.{3}/,"&"OFS,$NF)
        }
    }
    print
}

$ cat file1
"Name","Address","FullPhone"
"Mike Wise","101 Abc Drive","4061234567"

$ awk -f tst.awk file1
"Name","Address","NPA","TELNO"
"Mike Wise","101 Abc Drive","406","1234567"

$ cat file2            
"Name","Address","Areacode","Phone"
"Mike Wise","101 Abc Drive","406","123-4567"

$ awk -f tst.awk file2
"Name","Address","NPA","TELNO"
"Mike Wise","101 Abc Drive","406","1234567"

以及您没有要求的一些具体输入,但可能会发生,如果确实如此,将会正确处理:

$ cat file3
"Name","Address","FullPhone"
"Mike Wise","101 Abc Drive","406-1234-567"

$ awk -f tst.awk file3
"Name","Address","NPA","TELNO"
"Mike Wise","101 Abc Drive","406","1234567"

如果您有空格,则需要从输入的电话号码中删除,而不仅仅是- s,然后只需将gsub(/-/,"",$NF)更改为gsub(/[-[:space:]]/,"",$NF)gsub(/[^0-9]/,"",$NF)或类似内容。< / p>

答案 1 :(得分:1)

sed '1 c\
"Name","Address","Areacode","Phone"
     s/"\([0-9]\{3\}\)\([0-9]\{7\}\)"[[:space:]]*$/"\1","\2"/
     s/-\([0-9]\{1,6\}\)"[[:space:]]*$/\1"/
     ' YourFile

将同时适用于您的csv文件格式(也适用于@EdMorton注释后的标题)

  • 1c \:使用以下行更改第一行(强制使用此行而不是原始标题)
  • 首先s ///将更改任意一行,其尾随3位数字,后跟7位数字(因此10位数字,1包装),双引号包围2字段值3和7位数,每组使用s /的组功能//
  • 秒s ///将更改尾随-后跟1到6位数字,并使用相同的双引号而不使用-使用组功能(参考\1)。

第一个s ///不会占用第二个样本(没有模式对应),第二个不会占用第一个样本的行(同样的原因),并且不会在第一个s //(还是一样的道理) 第二行