我有以下CSV文件。我希望能够对其进行转换,所以我有没有URL的ID。
tID,type,usageID,Usage,status,tStatus,proParte,sName,snID,canName,scAuth,pnuID,tRank,trSort,King,class,subclass,family,created,modified,datasetName,tcID,Ref,refID,tRemarks,tDist,hClass,fhpName,fhpnID,shpn,shpnID,nomCode,Lic,ccaID
https://some-url.com/tree/90000607/90000610,scientific,https://some-url.com/tree/90000607/90000610,Bacteria,,accepted,f,Bacteria,https://some-url.com/name/bbni/90000608,Bacteria,,,Regnum,10,Bacteria,,,,2018-12-06 14:48:14.395+11,2018-12-06 14:48:14.708+11,BBC,https://some-url.com/instance/bbni/90000609,TWD,https://some-url.com/reference/bbni/90000596,,,Bacteria,,,,,ABC,-,/tree/90000607/90000610
我想完成以下两个结果之一。我已经多次尝试使用sed piping
来做不同的事情,但是我无法使用regEx
在一个命令中完成它。
选项1:
tID,type,usageID,Usage,status,tStatus,proParte,sName,snID,canName,scAuth,pnuID,tRank,trSort,King,class,subclass,family,created,modified,datasetName,tcID,Ref,refID,tRemarks,tDist,hClass,fhpName,fhpnID,shpn,shpnID,nomCode,Lic,ccaID
tree/90000607/90000610,scientific,tree/90000607/90000610,Bacteria,,accepted,f,Bacteria,name/bbni/90000608,Bacteria,,,Regnum,10,Bacteria,,,,2018-12-06 14:48:14.395+11,2018-12-06 14:48:14.708+11,BBC,instance/bbni/90000609,TWD,reference/bbni/90000596,,,Bacteria,,,,,ABC,-,/tree/90000607/90000610
选项2:
tID,type,usageID,Usage,status,tStatus,proParte,sName,snID,canName,scAuth,pnuID,tRank,trSort,King,class,subclass,family,created,modified,datasetName,tcID,Ref,refID,tRemarks,tDist,hClass,fhpName,fhpnID,shpn,shpnID,nomCode,Lic,ccaID
90000610,scientific,90000610,Bacteria,,accepted,f,Bacteria,90000608,Bacteria,,,Regnum,10,Bacteria,,,,2018-12-06 14:48:14.395+11,2018-12-06 14:48:14.708+11,BBC,90000609,TWD,90000596,,,Bacteria,,,,,ABC,-,90000610
如果有人可以帮助您完成以前的工作,那会对我有帮助。
我尝试过的事情:
#!/bin/bash
sed -e 's/[a-z]*:\/\/[a-z]*.[a-z]*.[a-z]*\/[a-z]*\/[a-z]*\/[a-z]*\/[a-z]*//g' BBC-taxon-2019-03-26-4546.csv > test.csv
sed -e 's/[0-9]\/[0-9]/[0-9]|[0-9]/g' test.csv
以上代码需要为每种替换类型编写命令,并每次都创建一个新文件,所以我放弃了。
#!/bin/bash
# Set Input File here...
input="BBC-taxon-2019-03-26-4546.csv"
# Check if file exists
[ ! -f $input ] && { echo "No file with name: $input. File not found"; exit 123; }
# Set file separator and read fields into variables
while IFS=',' read -ra fields;
do
echo "Fields: ${fields[*]}"
echo "Number of Elements: ${#fields[@]}"
echo "Each Element has: ${#fields}"
for i in "${fields[@]}"
do
echo $i
done
# fields[0] = ${fields[0]}
done < "$input"
以上代码创建了一个可迭代的数组,但是我不知道如何在特定列的每个值单元格上使用sed
。如果有人可以帮助,那就太好了。
答案 0 :(得分:1)
输入:
tID,type,usageID,Usage,status,tStatus,proParte,sName,snID,canName,scAuth,pnuID,tRank,trSort,King,class,subclass,family,created,modified,datasetName,tcID,Ref,refID,tRemarks,tDist,hClass,fhpName,fhpnID,shpn,shpnID,nomCode,Lic,ccaID
https://some-url.com/tree/90000607/90000610,scientific,https://some-url.com/tree/90000607/90000610,Bacteria,,accepted,f,Bacteria,https://some-url.com/name/bbni/90000608,Bacteria,,,Regnum,10,Bacteria,,,,2018-12-06 14:48:14.395+11,2018-12-06 14:48:14.708+11,BBC,https://some-url.com/instance/bbni/90000609,TWD,https://some-url.com/reference/bbni/90000596,,,Bacteria,,,,,ABC,-,/tree/90000607/90000610
对于选项1 ,请使用:
sed -E 's@(https?://[^,/]+)?(/[^/]+/[^/]+/[0-9]+)@\2@g' input.csv
tID,type,usageID,Usage,status,tStatus,proParte,sName,snID,canName,scAuth,pnuID,tRank,trSort,King,class,subclass,family,created,modified,datasetName,tcID,Ref,refID,tRemarks,tDist,hClass,fhpName,fhpnID,shpn,shpnID,nomCode,Lic,ccaID
/tree/90000607/90000610,scientific,/tree/90000607/90000610,Bacteria,,accepted,f,Bacteria,/name/bbni/90000608,Bacteria,,,Regnum,10,Bacteria,,,,2018-12-06 14:48:14.395+11,2018-12-06 14:48:14.708+11,BBC,/instance/bbni/90000609,TWD,/reference/bbni/90000596,,,Bacteria,,,,,ABC,-,/tree/90000607/90000610
对于选项2 ,请使用:
sed -E 's@(https?://[^,]+|(/[^,/]+)+)/([0-9]+)@\3@g' input.csv
tID,type,usageID,Usage,status,tStatus,proParte,sName,snID,canName,scAuth,pnuID,tRank,trSort,King,class,subclass,family,created,modified,datasetName,tcID,Ref,refID,tRemarks,tDist,hClass,fhpName,fhpnID,shpn,shpnID,nomCode,Lic,ccaID
90000610,scientific,90000610,Bacteria,,accepted,f,Bacteria,90000608,Bacteria,,,Regnum,10,Bacteria,,,,2018-12-06 14:48:14.395+11,2018-12-06 14:48:14.708+11,BBC,90000609,TWD,90000596,,,Bacteria,,,,,ABC,-,90000610
添加选项-i.bak
以直接更改输入文件(内联模式),将获取备份文件.bak
答案 1 :(得分:0)
如果您知道要解析的每件事都是一个url,并且不会与其他数据字段发生冲突,那么为什么不使用正则表达式来获取确切的url字符串呢?像这样:
sed -e 's/http[s]:\/\/.*\.com//g' test.csv
答案 2 :(得分:0)
如果您的数据位于“ d”文件中,请尝试使用gnu sed
第一行不打印树和编号,第二行不打印树和编号,因为它在替换处有\ 1。
sed -Ez 's#\bhttps://[^/]+/tree/\w+/##g ' d
sed -Ez 's#\bhttps://[^/]+(/tree/\w+/)#\1#g ' d