使用shell将面向列的文件转换为CSV输出

时间:2015-09-09 00:31:17

标签: python shell csv

我有一个文件来自地图缩减输出,格式如下,需要使用shell脚本转换为CSV

25-MAY-15
04:20
Client
0000000010
127.0.0.1
PAY
ISO20022
PAIN000
100
1
CUST
API
ABF07
ABC03_LIFE.xml
AFF07/LIFE
100000
Standard Life 

================================================
==================================================

AFF07-B000001

 2000

ABC Corp
..

BE900000075000027


AFF07-B000002

 2000

XYZ corp
..

BE900000075000027



AFF07-B000003

 2000

3MM corp
..

BE900000075000027

我需要输出如下面的CSV格式,我想在文件中重复一些值,并按以下格式添加TRANSACTION ID

25-MAY-15,04:20,Client,0000000010,127.0.0.1,PAY,ISO2002,PAIN000,100,1,CUST,API,ABF07,ABC03_LIFE.xml,AFF07/LIFE,100000,Standard Life, 25-MAY-15,04:20,Client,0000000010,127.0.0.1,PAY,ISO2002,PAIN000,100,1,CUST,API,AFF07-B000001, 2000,ABC Corp,..,BE900000075000027

25-MAY-15,04:20,Client,0000000010,127.0.0.1,PAY,ISO2002,PAIN000,100,1,CUST,API,ABF07,ABC03_LIFE.xml,AFF07/LIFE,100000,Standard Life, 25-MAY-15,04:20,Client,0000000010,127.0.0.1,PAY,ISO2002,PAIN000,100,1,CUST,API,AFF07-B000002,2000,XYZ Corp,..,BE900000075000027

TRANSACTION ID是AFF07-B000001,AFF07-B000002,AFF07-B000003,它们具有不同的值,我在交易ID开始的位置放置了一条标记的行。在分区之前,值应该重复,并且需要将事务ID列与行之前的重复值一起添加,如上面格式所示

BASH shell脚本我可能需要和CentOS是linux的味道

执行代码时,我收到如下错误

Traceback (most recent call last):
  File "abc.py", line 37, in <module>
    main()
  File "abc.py", line 36, in main
    createTxns(fh)
  File "abc.py", line 7, in createTxns
    first17.append( fh.readLine().rstrip() )
AttributeError: 'file' object has no attribute 'readLine'

有人可以帮助我吗

2 个答案:

答案 0 :(得分:0)

这是输入文件和输出格式的正确描述吗?

输入文件包含:

  • 17行,然后是
  • 每组10行 - 每组持有一个交易ID

每个输出行包括:

  • 29个常见字段,其次是
  • 从上述10行组中的每一个派生的5个字段

所以我们只是把它翻译成一些Python:

def createTxns(fh):
  """fh is the file handle of the input file"""

  # 1. Read 17 lines from fh
  first17 = []
  for i in range(17):
    first17.append( fh.readLine().rstrip() )


  # 2. Form the common fields.

  commonFields = first17 + first17[0:12]

  # 3. Process the rest of the file in groups of ten lines.

  while True:
      # read 10 lines
      group = []
      for i in range(10):
        x = fh.readline()
        if x == '':
          break
        group.append( x.rstrip() )

      if len(group) <> 10:
        break                   # we've reached the end of the file

      fields = commonFields + [ group[2], group[4], group[6], group[7[, group[9] ]

      row = ",".join(fields)

      print row

def main():
  with open("input-file", "r") as fh:
    createTxns(fh)

main()

此代码显示如何:

  • 打开文件句柄
  • 从文件句柄中读取行
  • 剥离结束换行符
  • 检查从文件中读取时的输入结束
  • 汇总列表
  • 将字符串连接在一起

答案 1 :(得分:0)

如果您要使用python路线,我建议您阅读Input and Output

你只需要解决问题并尝试一下。对于前17行,使用f.readline()并将其连接到字符串中。然后使用replace方法在csv中获取所需字符串的开头。

str.replace("\n", ",")

然后使用split方法将它们分解为列表。

str.split("\n")

然后在循环中写出文件。使用计数器让您的生活更轻松。首先写出标题字符串

25-MAY-15,04:20,Client,0000000010,127.0.0.1,PAY,ISO2002,PAIN000,100,1,CUST,API,ABF07,ABC03_LIFE.xml,AFF07/LIFE,100000,Standard Life, 25-MAY-15,04:20,Client,0000000010,127.0.0.1,PAY,ISO2002,PAIN000,100,1,CUST,API

然后用“,”在列表中写下项目。

,AFF07-B000001, 2000,ABC Corp,..,BE900000075000027

在5的计数处再次用标题写“\ n”并且不要忘记重置计数器以便它可以重新开始。

\n25-MAY-15,04:20,Client,0000000010,127.0.0.1,PAY,ISO2002,PAIN000,100,1,CUST,API,ABF07,ABC03_LIFE.xml,AFF07/LIFE,100000,Standard Life, 25-MAY-15,04:20,Client,0000000010,127.0.0.1,PAY,ISO2002,PAIN000,100,1,CUST,API

尝试一下,如果您需要更多助手,请告诉我们。我假设你有一些脚本背景:)祝你好运!!