Question

根据https://medium.com/mlreview/a-guide-to-receptive-field-arithmetic-for-convolutional-neural-networks-e0f514068807的指南，我正在尝试使用以下代码来计算输出功能的数量：

的输出：

%reset -f

import torch
import torch.nn as nn

my_tensor = torch.randn((1, 16, 12, 12), requires_grad=False)
print(my_tensor.shape)

update_1 = nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1)
print(update_1(my_tensor).shape)

是：

torch.Size([1, 16, 12, 12])
torch.Size([1, 16, 6, 6])

在应用公式的情况下torch.Size([1, 16, 6, 6])的计算方式：

（摘自https://medium.com/mlreview/a-guide-to-receptive-field-arithmetic-for-convolutional-neural-networks-e0f514068807）

尝试通过应用公式来手动计算输出功能的数量：

stride = 2
padding = 1
kernel_size = 3

# 2304 as n_in = 1 * 16 * 16 * 12

n_out = ((2304 + (2 * padding) - kernel_size) / stride) + 1

print(n_out)

打印：1152.5

但是产生的输出特征数量为print(1 * 16 * 6 *6) =576。我采用了1,16,6,6的乘积，因为这是print(update_1(my_tensor).shape)

更新：

基于以下问题，我已将代码更新为：

%reset -f

import torch
import torch.nn as nn
from math import floor

stride_value = 2
padding_value = 1
kernel_size_value = 3

number_channels = 3
width = 10
height = 12

my_tensor = torch.randn((1, number_channels, width, height), requires_grad=False)
print(my_tensor.shape)

update_1 = nn.Conv2d(in_channels=number_channels, 
                     out_channels=16, 
                     kernel_size=kernel_size_value, 
                     stride=stride_value, 
                     padding=padding_value)

print(update_1(my_tensor).shape)

n_out = floor((number_channels + (2 * padding_value) - kernel_size_value) / stride_value) + 1
print(n_out)

print(my_tensor.shape)产生： torch.Size（[1、3、10、12]）

print(update_1(my_tensor).shape)产生： torch.Size（[1，16，5，6]）

print(update_1(n_out).shape)产生： 2

2与每个维度中输出要素的数量不匹配。我是否正确执行了计算？

当水平特征产生的数量为5而垂直特征产生的数量为6时，此公式在特征数量不同的情况下不适用吗？对于图像，x和y轴值的长度不同是没有意义的？

Answer 1

我知道您的困惑来自何处。该公式计算输出的线性数量，而您假设它在整个张量上运行。

所以正确的代码是：

from math import floor

stride = 2
padding = 1
kernel_size = 3

n_out = floor((12 + (2 * padding) - kernel_size) / stride) + 1

print(n_out)

因此，它输出6个“水平”特征。由于输入张量具有相同的“垂直”尺寸（12），因此该公式还将产生6个“垂直”特征。最后，16是您在Conv2d中指定的输出通道数。

将它们放在一起，输出是

1 image in a batch,
16 channels,
6 horizontal features, and
6 vertical features,

共有576个功能。

更新

按照惯例，输出通道的数量不是由公式计算的，而是由nn.Conv2d作为第二个参数手动提供的。

因此，要更正上面的第二个代码：

import torch
import torch.nn as nn
from math import floor

stride_value = 2
padding_value = 1
kernel_size_value = 3

number_channels = 3
width = 10
height = 12

my_tensor = torch.randn((1, number_channels, width, height), requires_grad=False)
print(my_tensor.shape)

update_1 = nn.Conv2d(in_channels=number_channels, 
                     out_channels=16, 
                     kernel_size=kernel_size_value, 
                     stride=stride_value, 
                     padding=padding_value)

print(update_1(my_tensor).shape)

n_out1 = floor((width + (2 * padding_value) - kernel_size_value) / stride_value) + 1
n_out2 = floor((height + (2 * padding_value) - kernel_size_value) / stride_value) + 1
print("(Expected: 5, 6)", n_out1, n_out2)

Answer 2

该文章使用术语“功能”的方式非常奇怪，不规范。它们实际上是指“像素”，或更笼统地说是每个维度中要素图的大小。
将自身限制在图像范围内，然后该公式仅计算每个图像尺寸的像素数。因此，在这种情况下，我们在每个维度上都有RangeIndex: 31106 entries, 0 to 31105 Data columns (total 12 columns): ID 31106 non-null int64 High 31106 non-null float64 Last 31106 non-null float64 Timestampvalue 31106 non-null int64 Bid 31106 non-null float64 VWap 31106 non-null float64 Volume 31106 non-null float64 Low 31106 non-null float64 Ask 31106 non-null float64 Openamt 31106 non-null float64 Type 31106 non-null object timestamp 31106 non-null datetime64[ns] dtypes: datetime64[ns](1), float64(8), int64(2), object(1) memory usage: 2.8+ MB和n_in = 12（因此12像素宽的输入给出6像素宽的输出），并且公式匹配。

如果我们想要输出中 features 的实际数量，我们将得到16 * 6 * 6。

确定卷积运算的结果

2 个答案: