在Python中重新格式化Tab-Separated数据

时间:2018-01-18 21:18:13

标签: python arrays pandas numpy tab-delimited

我在Excel中有一些以制表符分隔的数据,需要重新格式化才能在Tableau中工作。这是它的样子:

State   2001    2002    2003    2004    2005    2006    2007
Alabama 5.6 5.71    5.88    6.08    6.46    7.07    7.57
Alaska  10.54   10.46   10.5    10.99   11.72   12.84   13.28
Arizona 7.27    7.21    7.34    7.45    7.79    8.24    8.54
Arkansas    6.05    5.61    5.57    5.67    6.3 6.99    6.96

以下是我需要它的样子:

State   Cost    Date
Alabama 5.6 12/31/2001
Alabama 5.71    12/31/2002
Alabama 5.88    12/31/2003
Alabama 6.08    12/31/2004
Alabama 6.46    12/31/2005
Alabama 7.07    12/31/2006
Alabama 7.57    12/31/2007
Alaska  10.54   12/31/2001
Alaska  10.46   12/31/2002
Alaska  10.5    12/31/2003
Alaska  10.99   12/31/2004
Alaska  11.72   12/31/2005
Alaska  12.84   12/31/2006
Alaska  13.28   12/31/2007
Arizona 7.27    12/31/2001
Arizona 7.21    12/31/2002
Arizona 7.34    12/31/2003
Arizona 7.45    12/31/2004
Arizona 7.79    12/31/2005
Arizona 8.24    12/31/2006
Arizona 8.54    12/31/2007
Arkansas    6.05    12/31/2001
Arkansas    5.61    12/31/2002
Arkansas    5.57    12/31/2003
Arkansas    5.67    12/31/2004
Arkansas    6.3 12/31/2005
Arkansas    6.99    12/31/2006
Arkansas    6.96    12/31/2007

在Python中实现这一目标的最佳方法是什么?我对Numpy和Pandas很熟悉,所以这些都是选项,但我真正想要的只是让Python吐出重新格式化的数据,以便我可以轻松地粘贴到Excel中。

2 个答案:

答案 0 :(得分:2)

在熊猫中,我会这样做:

假设您有以下DataFrame(从Excel中读取):

In [99]: df
Out[99]:
      State   2001   2002   2003   2004   2005   2006   2007
0   Alabama   5.60   5.71   5.88   6.08   6.46   7.07   7.57
1    Alaska  10.54  10.46  10.50  10.99  11.72  12.84  13.28
2   Arizona   7.27   7.21   7.34   7.45   7.79   8.24   8.54
3  Arkansas   6.05   5.61   5.57   5.67   6.30   6.99   6.96

解决方案:

In [102]: d = pd.melt(df, 'State', var_name='Date', value_name='Cost')

In [103]: d.assign(Date=pd.to_datetime(d['Date'])+pd.offsets.YearEnd())
Out[103]:
       State       Date   Cost
0    Alabama 2001-12-31   5.60
1     Alaska 2001-12-31  10.54
2    Arizona 2001-12-31   7.27
3   Arkansas 2001-12-31   6.05
4    Alabama 2002-12-31   5.71
5     Alaska 2002-12-31  10.46
6    Arizona 2002-12-31   7.21
7   Arkansas 2002-12-31   5.61
8    Alabama 2003-12-31   5.88
9     Alaska 2003-12-31  10.50
..       ...        ...    ...
18   Arizona 2005-12-31   7.79
19  Arkansas 2005-12-31   6.30
20   Alabama 2006-12-31   7.07
21    Alaska 2006-12-31  12.84
22   Arizona 2006-12-31   8.24
23  Arkansas 2006-12-31   6.99
24   Alabama 2007-12-31   7.57
25    Alaska 2007-12-31  13.28
26   Arizona 2007-12-31   8.54
27  Arkansas 2007-12-31   6.96

[28 rows x 3 columns]

您也可以轻松将其保存为Excel文件:

d.assign(Date=pd.to_datetime(d['Date'])+pd.offsets.YearEnd()) \
 .to_excel(r'/path/to/output.xlsx', index=False)

答案 1 :(得分:0)

你问过如何在Python中直接进行,
所以它会是这样的:

#!/usr/bin/env python
inputname  = 'Excel.txt'
outputname  = 'Tableau.txt'

with open( filename, 'r' ) as text:
  container = []
  for line in text:
    l = line.split()
    name = l[0]
    if name == 'State':
      dates = l
    else:
      end = len(l) -1
      for e in range(1, end):
        date = dates[e]
        cost = l[e]
        container.append(name, '12/31/' +date, cost)

with open(outputname, 'w') as out:
  stringified = '\n' .join(container)
  out.write(stringified)