我在Excel中有一些以制表符分隔的数据,需要重新格式化才能在Tableau中工作。这是它的样子:
State 2001 2002 2003 2004 2005 2006 2007
Alabama 5.6 5.71 5.88 6.08 6.46 7.07 7.57
Alaska 10.54 10.46 10.5 10.99 11.72 12.84 13.28
Arizona 7.27 7.21 7.34 7.45 7.79 8.24 8.54
Arkansas 6.05 5.61 5.57 5.67 6.3 6.99 6.96
以下是我需要它的样子:
State Cost Date
Alabama 5.6 12/31/2001
Alabama 5.71 12/31/2002
Alabama 5.88 12/31/2003
Alabama 6.08 12/31/2004
Alabama 6.46 12/31/2005
Alabama 7.07 12/31/2006
Alabama 7.57 12/31/2007
Alaska 10.54 12/31/2001
Alaska 10.46 12/31/2002
Alaska 10.5 12/31/2003
Alaska 10.99 12/31/2004
Alaska 11.72 12/31/2005
Alaska 12.84 12/31/2006
Alaska 13.28 12/31/2007
Arizona 7.27 12/31/2001
Arizona 7.21 12/31/2002
Arizona 7.34 12/31/2003
Arizona 7.45 12/31/2004
Arizona 7.79 12/31/2005
Arizona 8.24 12/31/2006
Arizona 8.54 12/31/2007
Arkansas 6.05 12/31/2001
Arkansas 5.61 12/31/2002
Arkansas 5.57 12/31/2003
Arkansas 5.67 12/31/2004
Arkansas 6.3 12/31/2005
Arkansas 6.99 12/31/2006
Arkansas 6.96 12/31/2007
在Python中实现这一目标的最佳方法是什么?我对Numpy和Pandas很熟悉,所以这些都是选项,但我真正想要的只是让Python吐出重新格式化的数据,以便我可以轻松地粘贴到Excel中。
答案 0 :(得分:2)
在熊猫中,我会这样做:
假设您有以下DataFrame(从Excel中读取):
In [99]: df
Out[99]:
State 2001 2002 2003 2004 2005 2006 2007
0 Alabama 5.60 5.71 5.88 6.08 6.46 7.07 7.57
1 Alaska 10.54 10.46 10.50 10.99 11.72 12.84 13.28
2 Arizona 7.27 7.21 7.34 7.45 7.79 8.24 8.54
3 Arkansas 6.05 5.61 5.57 5.67 6.30 6.99 6.96
解决方案:
In [102]: d = pd.melt(df, 'State', var_name='Date', value_name='Cost')
In [103]: d.assign(Date=pd.to_datetime(d['Date'])+pd.offsets.YearEnd())
Out[103]:
State Date Cost
0 Alabama 2001-12-31 5.60
1 Alaska 2001-12-31 10.54
2 Arizona 2001-12-31 7.27
3 Arkansas 2001-12-31 6.05
4 Alabama 2002-12-31 5.71
5 Alaska 2002-12-31 10.46
6 Arizona 2002-12-31 7.21
7 Arkansas 2002-12-31 5.61
8 Alabama 2003-12-31 5.88
9 Alaska 2003-12-31 10.50
.. ... ... ...
18 Arizona 2005-12-31 7.79
19 Arkansas 2005-12-31 6.30
20 Alabama 2006-12-31 7.07
21 Alaska 2006-12-31 12.84
22 Arizona 2006-12-31 8.24
23 Arkansas 2006-12-31 6.99
24 Alabama 2007-12-31 7.57
25 Alaska 2007-12-31 13.28
26 Arizona 2007-12-31 8.54
27 Arkansas 2007-12-31 6.96
[28 rows x 3 columns]
您也可以轻松将其保存为Excel文件:
d.assign(Date=pd.to_datetime(d['Date'])+pd.offsets.YearEnd()) \
.to_excel(r'/path/to/output.xlsx', index=False)
答案 1 :(得分:0)
你问过如何在Python中直接进行,
所以它会是这样的:
#!/usr/bin/env python
inputname = 'Excel.txt'
outputname = 'Tableau.txt'
with open( filename, 'r' ) as text:
container = []
for line in text:
l = line.split()
name = l[0]
if name == 'State':
dates = l
else:
end = len(l) -1
for e in range(1, end):
date = dates[e]
cost = l[e]
container.append(name, '12/31/' +date, cost)
with open(outputname, 'w') as out:
stringified = '\n' .join(container)
out.write(stringified)