RNN原理2

图示

黄色矩形框表示神经网络层
粉色圆圈点向运算
箭头线条表示向量丛一个点传输到另外一个点
合并线条表示两个向量合并
分叉线条表示向量分发

传统RNN网络结构

只有1个神经网络层（只有一个黄色矩形 tanh）.
{:.info}

传统RNN计算过程

输入：Input：

tensor containing the features of the input sequence

如果batch first 则$X$:
input of shape (batch, seq_len, input_size)
$$X=[N,L,H_{in}]$$
否则
input of shape (seq_len, batch, input_size)
$$X=[L,N,H_{in}]$$

L :represents a sequence length
N : batch size
H: input size
$$h_t = \text{tanh}(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh})$$
$x_t$ 的shape 为（batch_size, input_size)
$W_{i,h}$的shape为（input_size， hidden_size）
$h_{t-1}$的shape为（batch_size， hidden_size）
$W_{h,h}$的shape为（hidden_size， hidden_size）

初始隐藏状态(h0)

tensor containing the initial hidden state for each element in the batch.
$$h_0=(S, N, H_{out})$$
$h_0$ of shape (num_layers * num_directions, batch, hidden_size)
If the RNN is bidirectional, num_directions should be 2, else it should be 1.

$H_{out}=\text{hidden_size}$
$N=\text{batch_size}$
$S=\text{num_layers} * \text{num_directions}$

输出 Output

output1
最后一层（last layer）的每个时间单元的隐藏状态

output of shape (seq_len, batch, num_directions * hidden_size): tensor
containing the output features (h_t) from the last layer of the RNN, for each t.
Output1 shape :$(L, N, H_{all})$ where :math:$H_{all}=\text{num_directions} * \text{hidden_size}$

如果需要输出每个direction 则可以：
For the unpacked case, the directions can be separated using
1
output.view(seq_len, batch, num_directions, hidden_size)
- output2:ht
  最后一时间单元（t = seq_len）的所有层的隐藏状态。
  
  h_n of shape (num_layers * num_directions, batch, hidden_size): tensor
  containing the hidden state for t = seq_len.
  Output2: math:(S, N, H_{out}) tensor containing the next hidden state
  for each element in the batch.

Like output, the ** layers ** can be separated using

1	h_n.view(num_layers, num_directions, batch, hidden_size)

传统RNN 示例（pytorch)

RNN

input_size = 10
hidden_size = 20
num_layers = 2

rnn = torch.nn.RNN(100, 20, 3)  
input = torch.randn(5, 2, 100)  
h0 = torch.randn(3, 2, 20)  
output, hn = rnn(input, h0)

RNN参数列表

1 2	In [1]: dict(rnn._parameters).keys() Out [1]:dict_keys(['weight_ih_l0', 'weight_hh_l0', 'bias_ih_l0', 'bias_hh_l0', 'weight_ih_l1', 'weight_hh_l1', 'bias_ih_l1', 'bias_hh_l1'])

RNN参数shape

第1层layer 的$W_{ih}$的shape

1 2	In [1]: dict(rnn._parameters).get('weight_ih_l0').size() Out[1]: torch.Size([20, 100])

第2层layer 的$W_{ih}$的shape

1 2	In [1]: dict(rnn._parameters).get('weight_ih_l1').size() Out[1]: torch.Size([20, 20])

第三层layer的$W_{ih}$的shape

1 2	In [1]: dict(rnn._parameters).get('weight_ih_l2').size() Out[1]: torch.Size([20, 20])

第1-3层layer的$b_{ih}$的shape

In [1]: dict(rnn._parameters).get('bias_ih_l0').size()
Out[1]: torch.Size([20])
In [1]: dict(rnn._parameters).get('bias_ih_l1').size()
Out[1]: torch.Size([20])
In [1]: dict(rnn._parameters).get('bias_ih_l2').size()
Out[1]: torch.Size([20])

output1的shape
最后一层，所有时间单元的隐藏状态。
1
2
In [1]: output.shape
Out[1]: torch.Size([5, 2, 20])
5：sequence length
2: batch size
20: hidden size
output：ht的shape
最后一个时间单元的隐藏状态
1
2
In [1]: hn.shape
Out[1]: torch.Size([3, 2, 20])
3: num_layers
2: batch size
20: hidden_size

文章目录

图示