RNN原理2
图示
- 黄色矩形框表示 神经网络层
- 粉色圆圈点向运算
- 箭头线条 表示向量丛一个点传输到另外一个点
- 合并线条表示两个向量合并
- 分叉线条表示向量分发
传统RNN网络结构
只有1个神经网络层(只有一个黄色矩形 tanh).
{:.info}
传统RNN计算过程
输入:Input:
tensor containing the features of the input sequence
如果batch first 则$X$:
input of shape (batch, seq_len, input_size)
$$X=[N,L,H_{in}]$$
否则
input of shape (seq_len, batch, input_size)
$$X=[L,N,H_{in}]$$
- L :represents a sequence length
- N : batch size
- H: input size
$$h_t = \text{tanh}(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh})$$ - $x_t$ 的shape 为 (
batch_size, input_size
) - $W_{i,h}$的shape为 (
input_size
,hidden_size
) - $h_{t-1}$的shape为 (
batch_size
,hidden_size
) - $W_{h,h}$的shape为 (
hidden_size
,hidden_size
)
初始隐藏状态(h0)
tensor containing the initial hidden state for each element in the batch.
$$h_0=(S, N, H_{out})$$
$h_0$ of shape(num_layers * num_directions, batch, hidden_size)
If the RNN is bidirectional, num_directions should be 2, else it should be 1.
- $H_{out}=\text{hidden_size}$
- $N=\text{batch_size}$
- $S=\text{num_layers} * \text{num_directions}$
输出 Output
output1
最后一层(last layer)的每个时间单元的隐藏状态output of shape
(seq_len, batch, num_directions * hidden_size)
: tensor
containing the output features (h_t
) from the last layer of the RNN, for eacht
.
Output1 shape :$(L, N, H_{all})$ where :math:$H_{all}=\text{num_directions} * \text{hidden_size}$如果需要输出每个direction 则可以:
For the unpacked case, the directions can be separated using1
output.view(seq_len, batch, num_directions, hidden_size)
- output2:ht
最后一时间单元(t = seq_len
)的所有层的隐藏状态。h_n of shape
(num_layers * num_directions, batch, hidden_size)
: tensor
containing the hidden state fort = seq_len
.
Output2: math:(S, N, H_{out})
tensor containing the next hidden state
for each element in the batch.
- output2:ht
Like output, the ** layers ** can be separated using
1 | h_n.view(num_layers, num_directions, batch, hidden_size) |
传统RNN 示例(pytorch)
RNN
input_size = 10
hidden_size = 20
num_layers = 2
1 | rnn = torch.nn.RNN(100, 20, 3) |
RNN参数列表
1 | In [1]: dict(rnn._parameters).keys() |
RNN参数shape
第1层layer 的$W_{ih}$的shape
1
2In [1]: dict(rnn._parameters).get('weight_ih_l0').size()
Out[1]: torch.Size([20, 100])第2层layer 的$W_{ih}$的shape
1
2In [1]: dict(rnn._parameters).get('weight_ih_l1').size()
Out[1]: torch.Size([20, 20])第三层layer的$W_{ih}$的shape
1
2In [1]: dict(rnn._parameters).get('weight_ih_l2').size()
Out[1]: torch.Size([20, 20])第1-3层layer的$b_{ih}$的shape
1
2
3
4
5
6In [1]: dict(rnn._parameters).get('bias_ih_l0').size()
Out[1]: torch.Size([20])
In [1]: dict(rnn._parameters).get('bias_ih_l1').size()
Out[1]: torch.Size([20])
In [1]: dict(rnn._parameters).get('bias_ih_l2').size()
Out[1]: torch.Size([20])output1的shape
最后一层,所有时间单元的隐藏状态。1
2In [1]: output.shape
Out[1]: torch.Size([5, 2, 20])5:sequence length
2: batch size
20: hidden sizeoutput:ht的shape
最后一个时间单元的隐藏状态1
2In [1]: hn.shape
Out[1]: torch.Size([3, 2, 20])3: num_layers
2: batch size
20: hidden_size