如何使用request()和open()下载所有二进制文件(图像)?

时间:2019-05-10 04:43:48

标签: python-3.x image-processing web-scraping download python-requests

当我尝试从一个URL下载图像时,该代码有效,但是当我尝试另一个URL时,则无效。 这行不通。只会创建文件名。

 @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_addedit_entry);

        usernameEditText = findViewById(R.id.username_field);
        passwordEditText = findViewById(R.id.password_field);
        hintEditText = findViewById(R.id.hint_field);

        passwordABCD = findViewById(R.id.upp_checkbox);
        passwordabcd = findViewById(R.id.low_checkbox);
        password0123 = findViewById(R.id.num_checkbox);
        passwordSymbols = findViewById(R.id.sym_checkbox);

        radio4 = findViewById(R.id.four);
        radio8 = findViewById(R.id.eight);
        radio12 = findViewById(R.id.twelve);
        radio16 = findViewById(R.id.sixteen);

        Button generatePassword = findViewById(R.id.btn_password_generate);
        Button saveEntry = findViewById(R.id.btn_save);

        Intent intent = getIntent();

        if(intent.hasExtra(EXTRA_ID)){
        setTitle("Edit Entry");
        usernameEditText.setText(Objects.requireNonNull(getIntent().getExtras()).getString(EXTRA_USERNAME));
        passwordEditText.setText(Objects.requireNonNull(getIntent().getExtras()).getString(EXTRA_PASSWORD));
        hintEditText.setText(Objects.requireNonNull(getIntent().getExtras()).getString(EXTRA_HINT));

        Toast.makeText(this, "Info Received!!!", Toast.LENGTH_SHORT).show();
        Toast.makeText(this, Objects.requireNonNull(getIntent().getExtras()).getString(EXTRA_USERNAME), Toast.LENGTH_SHORT).show();
        Toast.makeText(this, Objects.requireNonNull(getIntent().getExtras()).getString(EXTRA_PASSWORD), Toast.LENGTH_SHORT).show();
        Toast.makeText(this, Objects.requireNonNull(getIntent().getExtras()).getString(EXTRA_HINT), Toast.LENGTH_SHORT).show();
    }
        else{setTitle("Add Entry");}

        generatePassword.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View v) {passwordEditText.setText(generatedPassword());}});

        saveEntry.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View v) {
                Intent data = new Intent();
                data.putExtra(EXTRA_USERNAME, usernameEditText.getText().toString());
                data.putExtra(EXTRA_HINT, hintEditText.getText().toString());
                data.putExtra(EXTRA_PASSWORD, passwordEditText.getText().toString());

                int id = getIntent().getIntExtra(EXTRA_ID, -1);

                if(id != -1){data.putExtra(EXTRA_ID, id);}

                setResult(RESULT_OK, data);
                finish();
            }
        });
    }

1 个答案:

答案 0 :(得分:0)

不同的门户网站可能具有不同的安全系统来阻止脚本/机器人。

在文本编辑器中打开image3.jpg时,您会看到

<head>
<title>Not Acceptable!</title>
</head>
<body>
<h1>Not Acceptable!</h1>
<p>An appropriate representation of the requested resource could not be found on  this server. 
This error was generated by Mod_Security.</p>
</body>
</html>

某些服务器可能需要正确的headersecookiessession-id等来访问数据。

此门户网站需要正确的标题user-agent

import requests

url = 'https://ryanspressurewashing.com/wp-content/uploads/2017/06/metal-roof-after-pressure-washing.jpg'

headers = {
  'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0'
}

r = requests.get(url, stream=True, headers=headers)

with open('image3.jpg', 'wb') as my_file:
# Read by 4KB chunks
    for byte_chunk in r.iter_content(chunk_size=4096):
        my_file.write(byte_chunk)

requests默认使用user-agent: python-requests/2.21.0,因此门户可以轻松识别并阻止脚本。

您可以使用https://httpbin.org/get

看到此标头
import requests

r = requests.get('https://httpbin.org/get')
print(r.text)

结果:

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.21.0"
  }, 
  "origin": "83.23.39.165, 83.23.39.165", 
  "url": "https://httpbin.org/get"
}

httpbin.org上查看更多功能