Python Scraping:如何在一个单元格中分隔多个属性(td)?

时间:2018-01-10 23:33:09

标签: python web-scraping beautifulsoup python-requests

在抓取HTML表格时,如果表格中的单元格(td)有多个属性(例如参见HTML代码段),您如何将两者分开和/或如何只选择一个?

HTML片段:

package com.sagproductions.gamertaggenerator;

import android.support.v7.app.AppCompatActivity;
import android.os.Bundle;
import android.view.View;
import android.widget.Button;
import android.widget.EditText;
import android.widget.TextView;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.lang.String;
import java.util.ArrayList;
import java.util.Random;

public class MainActivity extends AppCompatActivity {

    WordGenerator WG;
    private Button mSingleWordGenerator;
    private Button mTwoWordGenerator;
    private Button mThreeWordGenerator;
    private EditText editText;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        mSingleWordGenerator = (Button)findViewById(R.id.button);
        mSingleWordGenerator.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View v) {
                generateWord();
                editText=(EditText)findViewById(R.id.editText);
               editText.setText(WG.getRandomWord(),TextView.BufferType.EDITABLE);
            }
        });

        mTwoWordGenerator = (Button)findViewById(R.id.button2);
        mTwoWordGenerator.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View v) {

            }
        });

        mThreeWordGenerator = (Button)findViewById(R.id.button3);
        mThreeWordGenerator.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View v) {

            }
        });
    }

    public void generateWord(){
        final File file = new File("words.txt");
        ArrayList<String> words= new ArrayList<String>();
        Random rand=new Random();

        try(final Scanner scanner=new Scanner(file)){
            while(scanner.hasNextLine()){
                String line = scanner.nextLine();
                words.add(line);
            }
        }catch(FileNotFoundException e) {
            e.printStackTrace();
        }
        WG.setRandomWord(words.get(rand.nextInt(words.size())));
    }
}

我正在尝试的代码:

关于如何a)选择其中一个名称,或b)将单元格分成两个单元格的任何建议都将不胜感激。

谢谢。

2 个答案:

答案 0 :(得分:2)

如果您想要全名和短名称,可以试试这个:

for td in row.find_all('td'):
    full_name = td.find('a', {'class': 'full-name'}).text
    short_name = td.find('a', {'class': 'short-name'}).text

答案 1 :(得分:1)

尝试使用正则表达式匹配tr

players = the_soup.findAll('tr',{'class':re.compile("player-overview")})
for p in players:
    name = p.find('a',{'class':'full-name'}).get_text()