图示

  • 黄色矩形框表示 神经网络层
  • 粉色圆圈点向运算
  • 箭头线条 表示向量丛一个点传输到另外一个点
  • 合并线条表示两个向量合并
  • 分叉线条表示向量分发

传统RNN网络结构

只有1个神经网络层(只有一个黄色矩形 tanh).
{:.info}

传统RNN计算过程

输入:Input:

tensor containing the features of the input sequence

如果batch first 则$X$:
input of shape (batch, seq_len, input_size)
$$X=[N,L,H_{in}​]$$
否则
input of shape (seq_len, batch, input_size)
$$X=[L,N,H_{in}​]$$

  • L :represents a sequence length
  • N : batch size
  • H: input size
    $$h_t = \text{tanh}(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh})$$
  • $x_t$ 的shape 为 (batch_size, input_size)
  • $W_{i,h}$的shape为 (input_sizehidden_size
  • $h_{t-1}$的shape为 (batch_sizehidden_size
  • $W_{h,h}$的shape为 (hidden_sizehidden_size

初始隐藏状态(h0)

tensor containing the initial hidden state for each element in the batch.
$$h_0=(S, N, H_{out})$$
$h_0$ of shape (num_layers * num_directions, batch, hidden_size)
If the RNN is bidirectional, num_directions should be 2, else it should be 1.

  • $H_{out}=\text{hidden_size}$
  • $N=\text{batch_size}$
  • $S=\text{num_layers} * \text{num_directions}$

输出 Output

  • output1
    最后一层(last layer)的每个时间单元的隐藏状态

    output of shape (seq_len, batch, num_directions * hidden_size): tensor
    containing the output features (h_t) from the last layer of the RNN, for each t.
    Output1 shape :$(L, N, H_{all})$ where :math:$H_{all}=\text{num_directions} * \text{hidden_size}$

    如果需要输出每个direction 则可以:
    For the unpacked case, the directions can be separated using

    1
    output.view(seq_len, batch, num_directions, hidden_size)
    • output2:ht
      最后一时间单元(t = seq_len)的所有层的隐藏状态。

      h_n of shape (num_layers * num_directions, batch, hidden_size): tensor
      containing the hidden state for t = seq_len.
      Output2: math:(S, N, H_{out}) tensor containing the next hidden state
      for each element in the batch.

Like output, the ** layers ** can be separated using

1
h_n.view(num_layers, num_directions, batch, hidden_size)

传统RNN 示例(pytorch)

RNN

input_size = 10
hidden_size = 20
num_layers = 2

1
2
3
4
rnn = torch.nn.RNN(100, 20, 3)  
input = torch.randn(5, 2, 100)
h0 = torch.randn(3, 2, 20)
output, hn = rnn(input, h0)

RNN参数列表

1
2
In  [1]: dict(rnn._parameters).keys()
Out [1]:dict_keys(['weight_ih_l0', 'weight_hh_l0', 'bias_ih_l0', 'bias_hh_l0', 'weight_ih_l1', 'weight_hh_l1', 'bias_ih_l1', 'bias_hh_l1'])

RNN参数shape

  • 第1层layer 的$W_{ih}$的shape

    1
    2
    In [1]: dict(rnn._parameters).get('weight_ih_l0').size()
    Out[1]: torch.Size([20, 100])
  • 第2层layer 的$W_{ih}$的shape

    1
    2
    In [1]: dict(rnn._parameters).get('weight_ih_l1').size()
    Out[1]: torch.Size([20, 20])
  • 第三层layer的$W_{ih}$的shape

    1
    2
    In [1]: dict(rnn._parameters).get('weight_ih_l2').size()
    Out[1]: torch.Size([20, 20])
  • 第1-3层layer的$b_{ih}$的shape

    1
    2
    3
    4
    5
    6
    In [1]: dict(rnn._parameters).get('bias_ih_l0').size()
    Out[1]: torch.Size([20])
    In [1]: dict(rnn._parameters).get('bias_ih_l1').size()
    Out[1]: torch.Size([20])
    In [1]: dict(rnn._parameters).get('bias_ih_l2').size()
    Out[1]: torch.Size([20])
  • output1的shape
    最后一层,所有时间单元的隐藏状态。

    1
    2
    In [1]: output.shape
    Out[1]: torch.Size([5, 2, 20])

    5:sequence length
    2: batch size
    20: hidden size

  • output:ht的shape
    最后一个时间单元的隐藏状态

    1
    2
    In [1]: hn.shape
    Out[1]: torch.Size([3, 2, 20])

    3: num_layers
    2: batch size
    20: hidden_size