Python Pandas行值转换为列值

时间:2017-05-25 01:50:42

标签: python pandas stack reshape lreshape

我使用Python pandas读取数据帧如下:

<style type="text/css">
	table.tableizer-table {
		font-size: 12px;
		border: 1px solid #CCC; 
		font-family: Arial, Helvetica, sans-serif;
	} 
	.tableizer-table td {
		padding: 4px;
		margin: 3px;
		border: 1px solid #CCC;
	}
	.tableizer-table th {
		background-color: #104E8B; 
		color: #FFF;
		font-weight: bold;
	}
</style>
<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th>Time</th><th>Angle</th><th>Angle</th><th>Angle</th><th>Angle</th><th>FUEL_1</th><th>FUEL_2</th><th>Speed</th></tr></thead><tbody>
 <tr><td>3:06:38</td><td>5.3</td><td>5.3</td><td>5.3</td><td>5.3</td><td>1150</td><td>&nbsp;</td><td>1328</td></tr>
 <tr><td>3:06:39</td><td>5.3</td><td>5.3</td><td>5.3</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>1328</td></tr>
 <tr><td>3:06:40</td><td>5.3</td><td>5.3</td><td>5.3</td><td>5.3</td><td>&nbsp;</td><td>1150</td><td>1344</td></tr>
 <tr><td>3:06:41</td><td>5.3</td><td>5.6</td><td>5.6</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>1392</td></tr>
 <tr><td>3:06:42</td><td>5.6</td><td>5.6</td><td>5.6</td><td>5.6</td><td>1160</td><td>&nbsp;</td><td>1456</td></tr>
 <tr><td>3:06:43</td><td>5.6</td><td>5.6</td><td>6</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>1520</td></tr>
 <tr><td>3:06:44</td><td>6</td><td>6</td><td>6</td><td>6</td><td>&nbsp;</td><td>1160</td><td>1600</td></tr>
 <tr><td>3:06:45</td><td>6</td><td>6</td><td>6</td><td>6.3</td><td>&nbsp;</td><td>&nbsp;</td><td>1696</td></tr>
</tbody></table>

我想创建以下数据框:

<style type="text/css">
	table.tableizer-table {
		font-size: 12px;
		border: 1px solid #CCC; 
		font-family: Arial, Helvetica, sans-serif;
	} 
	.tableizer-table td {
		padding: 4px;
		margin: 3px;
		border: 1px solid #CCC;
	}
	.tableizer-table th {
		background-color: #104E8B; 
		color: #FFF;
		font-weight: bold;
	}
</style>
<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th>Time</th><th>Angle</th><th>FUEL_1</th><th>FUEL_2</th><th>Speed</th></tr></thead><tbody>
 <tr><td>3:06:38</td><td>5.3</td><td>1150</td><td>&nbsp;</td><td>1328</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:39</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>1328</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:40</td><td>5.3</td><td>&nbsp;</td><td>1150</td><td>1344</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:41</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>1392</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:42</td><td>5.6</td><td>1160</td><td>&nbsp;</td><td>1456</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:43</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>1520</td></tr>
 <tr><td>&nbsp;</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:44</td><td>6</td><td>&nbsp;</td><td>1160</td><td>1600</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>3:06:45</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>1696</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>
 <tr><td>&nbsp;</td><td>6.3</td><td>&nbsp;</td><td>&nbsp;</td><td></td></tr>
</tbody></table>

我的想法是通过'Time','FUEL_1','FUEL_2','Speed'插入几个空列,然后逐个堆叠这些列然后合并它们。你有更简单的想法吗?

1 个答案:

答案 0 :(得分:0)

所以我很确定使用pandas.read_html可以很容易地做到这一点,但我不像BeautifulSoup那样熟悉。

html = """<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th>Time</th><th>Angle</th><th>Angle</th><th>Angle</th><th>Angle</th><th>FUEL_1</th><th>FUEL_2</th><th>Speed</th></tr></thead><tbody>
 <tr><td>3:06:38</td><td>5.3</td><td>5.3</td><td>5.3</td><td>5.3</td><td>1150</td><td>&nbsp;</td><td>1328</td></tr>
 <tr><td>3:06:39</td><td>5.3</td><td>5.3</td><td>5.3</td><td>5.3</td><td>&nbsp;</td><td>&nbsp;</td><td>1328</td></tr>
 <tr><td>3:06:40</td><td>5.3</td><td>5.3</td><td>5.3</td><td>5.3</td><td>&nbsp;</td><td>1150</td><td>1344</td></tr>
 <tr><td>3:06:41</td><td>5.3</td><td>5.6</td><td>5.6</td><td>5.6</td><td>&nbsp;</td><td>&nbsp;</td><td>1392</td></tr>
 <tr><td>3:06:42</td><td>5.6</td><td>5.6</td><td>5.6</td><td>5.6</td><td>1160</td><td>&nbsp;</td><td>1456</td></tr>
 <tr><td>3:06:43</td><td>5.6</td><td>5.6</td><td>6</td><td>6</td><td>&nbsp;</td><td>&nbsp;</td><td>1520</td></tr>
 <tr><td>3:06:44</td><td>6</td><td>6</td><td>6</td><td>6</td><td>&nbsp;</td><td>1160</td><td>1600</td></tr>
 <tr><td>3:06:45</td><td>6</td><td>6</td><td>6</td><td>6.3</td><td>&nbsp;</td><td>&nbsp;</td><td>1696</td></tr>
</tbody></table>"""

import pandas as pd
from bs4 import BeautifulSoup

def read_table(html):
  header, matrix = [], []
  bs = BeautifulSoup(html, "html.parser")
  for row in bs.findAll("tr"):
    if(row.find("th")):
      header = [ r.get_text().strip() for r in row.findAll("th") ]
    else: #td
      matrix.append([ r.get_text().strip() for r in row.findAll("td") ])

  df = pd.DataFrame(matrix, columns=header)
  return df

将您提供的html传递给此函数将返回一个熊猫的数据框,然后您可以选择所需的列。

df = read_table(html)
df[["Time","FUEL_1","FUEL_2","Speed"]]
      Time FUEL_1 FUEL_2 Speed
0  3:06:38   1150         1328
1  3:06:39                1328
2  3:06:40          1150  1344
3  3:06:41                1392
4  3:06:42   1160         1456
5  3:06:43                1520
6  3:06:44          1160  1600
7  3:06:45                1696