将长行拆分为多行

时间:2019-07-16 18:44:35

标签: python-3.x pandas numpy

首次发布-感谢您提供任何反馈意见。 python中的新用户试图让刮板从AAA获得全国汽油价格。问题是如何将numpy转换为熊猫以保留行-截至目前,它已转换为DF并将所有数据放在一行中。

import requests
import numpy as np
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://gasprices.aaa.com'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'}  # This is chrome, you can set whatever browser you like
result = requests.get(url,headers = headers)

data = []
src = result.content
soup = BeautifulSoup(src,'lxml') #passing source variable into bs class to create an object
rows = soup.findAll("table", {"class": "table-mob"}) #parse box info
headers = []

for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])

for row in rows:
    headers = row.find_all('th')
    headers = [ele.text.strip() for ele in headers]
    headers.append([ele for ele in headers if ele])


data = np.array(data)
df = pd.DataFrame(data)
print(df)

电流输出: enter image description here

目标是让它像这样:

Current Avg.    $2.79   $3.11   $3.36   $3.01   $2.45                  
Yesterday Avg.  $2.79   $3.11   $3.36   $3.01   $2.45 
Week Ago Avg.   $2.75   $3.07   $3.32   $3.00   $2.42 
Month Ago Avg.  $2.69   $3.03   $3.28   $3.02   $2.33 
Year Ago Avg.   $2.87   $3.17   $3.42   $3.17   $2.43

1 个答案:

答案 0 :(得分:0)

np.reshape

  • np.reshapedf中的值转换为 6 x 5 数组,先向下展开。
  • idx, *dat将结果数组中的第一行解压缩为名称idx,其余部分解压缩到名为dat的列表中。 Variable Unpacking

idx, *dat = np.reshape(df.to_numpy(), (6, -1), order='F')
pd.DataFrame(dat, columns=idx).T

                     0       1       2       3       4
Current Avg.    $2.793  $3.107  $3.362  $3.010  $2.449
Yesterday Avg.  $2.792  $3.105  $3.360  $3.010  $2.449
Week Ago Avg.   $2.749  $3.068  $3.321  $3.004  $2.417
Month Ago Avg.  $2.689  $3.027  $3.280  $3.018  $2.332
Year Ago Avg.   $2.871  $3.167  $3.416  $3.168  $2.432

类似地

idx, *dat = np.reshape(df.to_numpy(), (6, -1), order='F')
pd.DataFrame(dict(zip(idx, dat))).T

如果df.to_numpy不起作用,请尝试:

idx, *dat = np.reshape(df.values, (6, -1), order='F')
pd.DataFrame(dict(zip(idx, dat))).T