将行的一部分转换为列

时间:2015-02-02 20:39:44

标签: python awk rows unpivot

我有一个带输入的文件:

rownum,identifier,items_in_list
1,"ABC",{(123),(345),(69),(95),(90),(83),(3A)}

预期输出为:

rownum,identifier,items_in_list
1,"ABC",123
1,"ABC",345
1,"ABC",69
1,"ABC",95
1,"ABC",90
1,"ABC",83
1,"ABC",3A

我尝试使用" awk"但它是用于将列中的所有项目都转换为行,但是我只需要一些列到行..

我的代码:

echo "1,"ABC",{(123),(345),(69),(95),(90),(83),(3A)}" | awk -vRS="{" 'NF'

但是转换为:

1,ABC,
(123),(345),(69),(95),(90),(83),(3A)}

更新

你的所有命令都运行良好,但是对于一个小故障抱歉作为一个新手,我只能投一个作为答案。

谢谢!但是我遇到麻烦,如果行没有多个数字并只有一个......例如,采用以下格式:

输入

1,33262,"ABC",{(64)} 
1,33263,"ABC",{(66),(57)}

实际输出:

1,33262,SOME_FIELD_NAME 
1,33262,64 
1,33263,SOME_FIELD_NAME 
1,33262,65,66 

必需输出:

1,33262,SOME_FIELD_NAME,64 
1,33263,SOME_FIELD_NAME,65
1,33263,SOME_FIELD_NAME,66

更新:

"实际输出" Jotne建议的代码:awk -F,' {a = $ 1"," $ 2; gsub(/ [{()}] /,""); for(i = 3; i< = NF; i ++)打印"," $ i}'文件。

很抱歉,我的输入有时会有2个前导字段,有时会有3到10个前导字段,但我们要转换为列的行始终以' {' ,个别号码包含在'()'行的结尾用'}'表示。 Jotne的代码适用于2个主要领域,但3个领先领域失败。有人可以建议一种解析字段的通用方法吗?

4 个答案:

答案 0 :(得分:0)

这是awk

的一种方式
awk -F, '{a=$1","$2;gsub(/[{()}]/,"");for (i=3;i<=NF;i++) print a","$i}' file
1,"ABC",123
1,"ABC",345
1,"ABC",69
1,"ABC",95
1,"ABC",90
1,"ABC",83
1,"ABC",3A

使用RS

awk -vRS=, '{gsub(/[{()}]/,"")} NR==1 {a=$1;next} NR==2 {a=a","$1;next} {print a","$1}' file
1,"ABC",123
1,"ABC",345
1,"ABC",69
1,"ABC",95
1,"ABC",90
1,"ABC",83
1,"ABC",3A

答案 1 :(得分:0)

如果您仍在寻找Python解决方案:

input = '1,"ABC",{(123),(345),(69),(95),(90),(83),(3A)}'
for extra_char in '{}()"':
    input = input.replace(extra_char, '')
input_elems = input.split(',')
rownum, identifier = input_elems[0:2]
for item in input_elems[2:]:
    print rownum, identifier, item

答案 2 :(得分:0)

基于Python的解决方案:

import csv
import re

data = ['rownum,identifier,items_in_list',
        '1,"ABC",{(123),(345),(69),(95),(90),(83),(3A)}']

reader = csv.reader(data)  # change data to open(filename, 'rb')
pat = r'{*\(([0-9a-fA-F]+)\)}*'
next(reader)
for row in reader:
    for elem in row[2:]:
        mat = re.search(pat, elem).group(1)
        print(','.join([row[0], '"{}"'.format(row[1]), mat]))

输出:

1,"ABC",123
1,"ABC",345
1,"ABC",69
1,"ABC",95
1,"ABC",90
1,"ABC",83
1,"ABC",3A

答案 3 :(得分:0)

awk -F, '{gsub(/)./,ORS); gsub(/(^[^(]+)?[(]/,$1 OFS $2 OFS); printf "%s",$0}' file
1,"ABC",123
1,"ABC",345
1,"ABC",69
1,"ABC",95
1,"ABC",90
1,"ABC",83
1,"ABC",3A