将html提取保存到列表Python列表中

时间:2018-01-30 17:42:06

标签: python beautifulsoup

我从html页面中提取一些数据。我的输出是:

 0
0
0
0

131997
https://www.google.com.ar/
google.com.ar
 0
0
0
0

134930
https://www.a-a.com/
a-a.com

我正在寻找这种输出:

[['0','0','0','0','131997','https://www.google.com.ar/','google.com.ar'],['0','0','0','0','134930','https://www.a-a.com/','a-a.com']]

这是我的python代码:

sitios = requests.get(url_sitios, auth=HTTPBasicAuth(user, passwd))
sitios2 = sitios.text
html = sitios2
soup = BeautifulSoup(html, 'lxml') #add the 'lxml' parser
for item in soup.find_all(['nombre', 'url', 'sitio_id', 'ultimas24hrs']):
   a = item.text + ','
   print a

3 个答案:

答案 0 :(得分:1)

您可以尝试这样的事情:

data = ['0','0','0','0','131997','https://www.google.com.ar/','google.com.ar','0','0','0','0','134930','https://www.a-a.com/','a-a.com']
a = []
count = 1
b = []
for item in data:
    if count == 7:
        a.append(b)
        count = 1
        b = []
    else:
        b.append(item)
        count = count + 1       
print(a)

答案 1 :(得分:1)

  

可以使用列表推导在两行中完成。

现在你有一个字符串:

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="wrap_content"
    android:orientation="horizontal">

    <TextView
        android:layout_weight="1"
        android:id="@+id/text1"
        android:gravity="start" //left allign
        android:layout_width="0dp"
        android:layout_marginEnd="10dp" // for spacing between two
        android:layout_height="wrap_content"
        tools:text="Example of code" />

    <TextView
        android:gravity="end"   // right allign
        android:layout_weight="1"
        android:id="@+id/text2"
        android:layout_width="0dp"
        android:layout_marginStart="10dp" // for spacing between two
        android:layout_height="wrap_content"
        tools:text="Sample text here, it shouldn't occupy much space" />

</LinearLayout>

虽然看起来这个解决方案可能让您感到困惑,但它仍然表明您的问题也可以在两行中解决。

请阅读this,了解有关上述内容的详细信息。

答案 2 :(得分:0)

a = []
sitios = requests.get(url_sitios, auth=HTTPBasicAuth(user, passwd))
html = sitios.text
soup = BeautifulSoup(html, 'lxml') #add the 'lxml' parser
for item in soup.find_all(['nombre', 'url', 'sitio_id', 'ultimas24hrs']):
    a.append(item.text.split('\n'))