我有多行,如:
"390";"902";"from 4670000 to 4679999, from 4680000 to 4689999, from 9960000 to 9969999";"something1";"something2";"20.09.04"
"390";"903";"from 0770000 to 0779999, from 9170000 to 9179999";"something3";"something4";"09.09.04"
我需要的是:
"390";"902";"from 4670000 to 4679999";"something1";"something2";"20.09.04"
"390";"902";"from 4680000 to 4689999";"something1";"something2";"20.09.04"
"390";"902";"from 9960000 to 9969999";"something1";"something2";"20.09.04"
"390";"903";"from 0770000 to 0779999";"something3";"something4";"09.09.04"
"390";"903";"from 9170000 to 9179999";"something3";"something4";"09.09.04"
正如你所看到的,我需要将变量3从/到标签分开(注意“...”之间有时会有空格。)
理想情况下,我需要产生结果:
"390";"902";"4670000";"4679999";"something1";"something2";"20.09.04"
"390";"902";"4680000";"4689999";"something1";"something2";"20.09.04"
"390";"902";"9960000";"9969999";"something1";"something2";"20.09.04"
"390";"903";"0770000";"0779999";"something3";"something4";"09.09.04"
"390";"903";"9170000";"9179999";"something3";"something4";"09.09.04"
我已经发现我可以通过awk进行拆分,但我不确定如何复制其余部分:
awk -F\, '{
for (i = 0; ++i <= NF;)
print i, $i
}' <<<'from 4670000 to 4679999, from 4680000 to 4689999, from 9960000 to 9969999'
1 from 4670000 to 4679999
2 from 4680000 to 4689999
3 from 9960000 to 9969999
对不起,这是我在这里的第一个问题,请随时指出我应该如何纠正它以便完全回答。
谢谢!
答案 0 :(得分:4)
输入:
"390";"902";"from 4670000 to 4679999, from 4680000 to 4689999, from 9960000 to 9969999";"something1";"something2";"20.09.04"
"390";"903";"from 0770000 to 0779999, from 9170000 to 9179999";"something3";"something4";"09.09.04"
此代码
#!/usr/bin/awk -f
BEGIN {
FS = ";"
}
{
t = $3
gsub(/"/, "", t)
n = split(t, a, /, /)
for (i = 1; i <= n; ++i) {
print $1 ";" $2 ";\"" a[i] "\";" $4 ";" $5 ";" $6
}
}
会给予
"390";"902";"from 4670000 to 4679999";"something1";"something2";"20.09.04"
"390";"902";"from 4680000 to 4689999";"something1";"something2";"20.09.04"
"390";"902";"from 9960000 to 9969999";"something1";"something2";"20.09.04"
"390";"903";"from 0770000 to 0779999";"something3";"something4";"09.09.04"
"390";"903";"from 9170000 to 9179999";"something3";"something4";"09.09.04"
简洁形式(我认为它不能真正被称为真正的“单行”):
awk -F ";" -- '{ t = $3; gsub(/"/, "", t); n = split(t, a, /, /); for (i = 1; i <= n; ++i) print $1 ";" $2 ";\"" a[i] "\";" $4 ";" $5 ";" $6 }'
这段代码
#!/usr/bin/awk -f
BEGIN {
FS = ";"
}
{
t = $3
gsub(/"|from /, "", t)
n = split(t, a, /, | to /)
for (i = 1; i <= n; i += 2) {
print $1 ";" $2 ";\"" a[i] "\";\"" a[i + 1] "\";"$4 ";" $5 ";" $6
}
}
会给予
"390";"902";"4670000";"4679999";"something1";"something2";"20.09.04"
"390";"902";"4680000";"4689999";"something1";"something2";"20.09.04"
"390";"902";"9960000";"9969999";"something1";"something2";"20.09.04"
"390";"903";"0770000";"0779999";"something3";"something4";"09.09.04"
"390";"903";"9170000";"9179999";"something3";"something4";"09.09.04"
简明形式:
awk -F ";" -- '{ t = $3; gsub(/"|from /, "", t); n = split(t, a, /, | to /); for (i = 1; i <= n; i += 2) print $1 ";" $2 ";\"" a[i] "\";\"" a[i + 1] "\";"$4 ";" $5 ";" $6; }'
使用gawk,nawk和mawk测试脚本。
答案 1 :(得分:3)
awk -F'";"' -v OFS='";"' '{n=split($3,a,/,\s*/);for(i=1;i<=n;i++){$3=a[i];print}}' file
输出:
kent$ cat f
"390";"902";"from 4670000 to 4679999, from 4680000 to 4689999, from 9960000 to 9969999";"something1";"something2";"20.09.04"
"390";"903";"from 0770000 to 0779999, from 9170000 to 9179999";"something3";"something4";"09.09.04"
kent$ awk -F'";"' -v OFS='";"' '{n=split($3,a,/,\s*/);for(i=1;i<=n;i++){$3=a[i];print}}' f
"390";"902";"from 4670000 to 4679999";"something1";"something2";"20.09.04"
"390";"902";"from 4680000 to 4689999";"something1";"something2";"20.09.04"
"390";"902";"from 9960000 to 9969999";"something1";"something2";"20.09.04"
"390";"903";"from 0770000 to 0779999";"something3";"something4";"09.09.04"
"390";"903";"from 9170000 to 9179999";"something3";"something4";"09.09.04"
修改强>
如果你想要解析from...to
,仍然是一个awk oneliner:
awk -F'";"' -v OFS='";"' '{n=split($3,a,/,\s*/);for(i=1;i<=n;i++)
{$3=a[i];sub(/\s*to\s*/,"\";\"",$3);sub(/\s*from\s*/,"",$3);print}}' file
使用相同的输入文件进行测试:
kent$ awk -F'";"' -v OFS='";"' '{n=split($3,a,/,\s*/);for(i=1;i<=n;i++){$3=a[i];sub(/\s*to\s*/,"\";\"",$3);sub(/\s*from\s*/,"",$3);print}}' f
"390";"902";"4670000";"4679999";"something1";"something2";"20.09.04"
"390";"902";"4680000";"4689999";"something1";"something2";"20.09.04"
"390";"902";"9960000";"9969999";"something1";"something2";"20.09.04"
"390";"903";"0770000";"0779999";"something3";"something4";"09.09.04"
"390";"903";"9170000";"9179999";"something3";"something4";"09.09.04"
答案 2 :(得分:2)
$ cat tst.awk
BEGIN{ FS=OFS="\";\"" }
{
gsub(/from /,"",$3)
split($3,a,/ *, */)
for (i=1;i in a;i++) {
$3 = a[i]
sub(/ to /,OFS,$3)
print
}
}
$
$ awk -f tst.awk file
"390";"902";"4670000";"4679999";"something1";"something2";"20.09.04"
"390";"902";"4680000";"4689999";"something1";"something2";"20.09.04"
"390";"902";"9960000";"9969999";"something1";"something2";"20.09.04"
"390";"903";"0770000";"0779999";"something3";"something4";"09.09.04"
"390";"903";"9170000";"9179999";"something3";"something4";"09.09.04"
答案 3 :(得分:2)
这可能适合你(GNU sed):
sed -r 's/, /","/g;s/^(([^;]*;){2})([^,]*),([^;]*)(.*)/\1\3\5\n\1\4\5/;P;D' file
答案 4 :(得分:1)
#!/bin/bash
filename='file.txt'
temp=$(mktemp)
sed 's/, */";"/g' "$filename" > "$temp" # replace commas with ;
echo -n > "$filename" # clear our file
while read line; do
IFS=';' read -a fields <<< "$line" # make an array out of the string
for ((i=2; i<${#fields[@]}-3; i++)); do
from=$(echo "${fields[$i]}" | cut -d ' ' -f2)
to=$(echo "${fields[$i]}" | cut -d ' ' -f4)
echo "${fields[0]};${fields[1]};\"$from\";\"$to;${fields[-3]};${fields[-2]};${fields[-1]}" >> "$filename"
done
done < "$temp"
rm "$temp"
exit 0
它也会在逗号之前处理空格。
答案 5 :(得分:1)
这是在Bash中执行此操作的另一种方法:
#!/bin/bash
shopt -s extglob
IFS=';'
while read -a FIELDS; do
TEMP=${FIELDS[2]//\"}
read -a RANGES <<< "${TEMP//,?( )/;}"
for A in "${RANGES[@]}"; do
echo "${FIELDS[0]};${FIELDS[1]};\"$A\";${FIELDS[*]:3}"
done
done
使用
运行bash script.sh < file
这将给出第一个预期的输出。
或者
#!/bin/bash
shopt -s extglob
IFS=';'
while read -a FIELDS; do
TEMP=${FIELDS[2]//@(\"|from )}
read -a RANGES <<< "${TEMP//@(,?( )| to )/;}"
for (( I = 0; I < ${#RANGES[@]}; I += 2 )); do
echo "${FIELDS[0]};${FIELDS[1]};\"${RANGES[I]}\";\"${RANGES[I + 1]}\";${FIELDS[*]:3}"
done
done
哪个会获得第二个预期输出。
答案 6 :(得分:0)
以下是使用python的一种方法。我知道你没有标记它,但我似乎更容易用一个好的解析器处理csv
文件。它用逗号分割第三个字段(row[2]
),之后它在空格中分割该字段的每个字符串并提取奇数字段(v.split()[1::2]
)。
script.py
的内容:
#!/usr/bin/env python3
import csv
import sys
import copy
with open(sys.argv[1], 'r') as f:
csvfile = csv.reader(f, delimiter=';')
csvout = csv.writer(sys.stdout, delimiter=';', quoting=csv.QUOTE_ALL)
for row in csvfile:
v3 = row[2].split(r', ')
for v in v3:
newrow = copy.deepcopy(row)
fields = v.split()[1::2]
newrow[2:3] = fields
csvout.writerow(newrow)
像以下一样运行:
python3 script.py infile
产量:
"390";"902";"4670000";"4679999";"something1";"something2";"20.09.04"
"390";"902";"4680000";"4689999";"something1";"something2";"20.09.04"
"390";"902";"9960000";"9969999";"something1";"something2";"20.09.04"
"390";"903";"0770000";"0779999";"something3";"something4";"09.09.04"
"390";"903";"9170000";"9179999";"something3";"something4";"09.09.04"