我的数据的前两行:
"Rec_Open_Date","MSISDN","IMEI","Data_Volume_Bytes","Device_Manufacturer","Device_Model","Product_Description"
"2015-10-06","123427","456060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan"
我只想要第2列和第3列的最后3个字符,我不希望列标题受到影响。 很高兴能够首先执行column2然后执行第3列的解决方案
我现在正在摆弄sed和awk,但还没有快乐。
这就是我想要的:
"Rec_Open_Date","MSISDN","IMEI","Data_Volume_Bytes","Device_Manufacturer","Device_Model","Product_Description"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan"
edit1 这给了我最后3位数字(+“),只需要把它写回原始文件?
$ awk -F"," 'NR>1{ print $2}' head_test_real.csv | sed 's/.*\(....\)/\1/'
427"
592"
007"
592"
409"
742"
387"
731"
556"
edit2 这有效,但我输了双引号“123427”转到427,我想保留双引号。
* NR> 1适用于第1行之后的行。
$ awk -F, 'NR>1{$2=substr($2,length($2)-3,3)}1' OFS=, head_test_real.csv
"Rec_Open_Date","MSISDN","IMEI","Data_Volume_Bytes","Device_Manufacturer","Device_Model","Product_Description"
"2015-10-06",427,"456060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan"
edit3 @Mark来回答正确的答案,这里只是为了引用我的引用。
$ ####csv.QUOTE_ALL
$ cat out.csv
"Rec_Open_Date","MSISDN","IMEI","Data_Volume_Bytes","Device_Manufacturer","Device_Model","Product_Description"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan"
$ ####csv.QUOTE_MINIMAL
$ cat out.csv
Rec_Open_Date,MSISDN,IMEI,Data_Volume_Bytes,Device_Manufacturer,Device_Model,Product_Description
2015-10-06,427,060,137765,Samsung Korea,Samsung SM-G900I,$39 Plan
$ ###csv.QUOTE_NONNUMERIC
$ cat out.csv
"Rec_Open_Date","MSISDN","IMEI","Data_Volume_Bytes","Device_Manufacturer","Device_Model","Product_Description"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan"
$ ###csv.QUOTE_NONE
$ cat out.csv
Rec_Open_Date,MSISDN,IMEI,Data_Volume_Bytes,Device_Manufacturer,Device_Model,Product_Description
2015-10-06,427,060,137765,Samsung Korea,Samsung SM-G900I,$39 Plan
答案 0 :(得分:2)
虽然awk
似乎非常适合以逗号分隔的数据,但它并不能很好地处理引用字段版本。我建议使用像Python附带的专用CSV处理库(2和3):
import csv
with open('in.csv','r') as infile:
reader = csv.reader(infile)
with open('out.csv','w') as outfile:
writer = csv.writer(outfile,delimiter=',',quotechar='"',quoting=csv.QUOTE_ALL)
writer.writerow(next(reader))
for row in reader:
row[1] = row[1][-3:]
row[2] = row[2][-3:]
writer.writerow(row)
将上述代码放入名为eg的文件中fixcsv.py
并使文件名与您拥有和想要的文件名匹配,然后使用python fixcsv.py
(或python3 fixcsv.py
)运行它。
我将其设置为引用输出中的所有内容(QUOTE_ALL
);如果您不希望这样做,可以将其设置为QUOTE_MINIMAL
,QUOTE_NONNUMERIC
或QUOTE_NONE
。
row
分配会替换第二个和第三个字段(row[1]
和row[2]
,因为第一个字段为row[0]
),其后三个字符为[-3:]
})。您也可以使用例如算法进行算术。 row[1] = int(row[1]) % 1000
。
答案 1 :(得分:1)
Perl救援!
perl -pe 's/",".*?(...",")/","$1/ if $. > 1' < input > output
-p
逐行读取输入并打印结果s/regex/replacement/
是替换.*?
匹配任何内容(例如.*
),但问号会使其“节俭”,即它匹配可能的最短字符串(...",")
在","
之前创建一个从三个字符开始的捕获组,它可以被引用为$1
。$.
是行号,第1行没有替换。确保始终引用前两列,第二列永远不会短于3个字符。
要修改第三列,可以将正则表达式修改为
perl -pe 's/^("(?:.*?","){2}).*?(...",")/$1$2/ if $. > 1'
# ~
修改指定的数字以处理您喜欢的任何列。
答案 2 :(得分:1)
$ awk 'BEGIN{FS=OFS="\",\""} NR>1{for (i=2;i<=3;i++) $i=substr($i,length($i)-2)} 1' file
"Rec_Open_Date","MSISDN","IMEI","Data_Volume_Bytes","Device_Manufacturer","Device_Model","Product_Description"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan"
与任何命令一样,写回原始文件只是:
command file > tmp && mv tmp file