RNN的SoftmaxWithLoss层

需要注意的点

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/8a36c85d-a7a9-4eea-b798-889469a75258/Untitled.png

RNN论文索引

[1] Recurrent Neural Network Regularization (<https://arxiv.org/abs/1409.2329>)
		#在纵向实现dropout,从而使得每一层RNN仍然能够记忆之前的隐藏层含义
[2] Using the Output Embedding to Improve Language Models (<https://arxiv.org/abs/1608.05859>)
[3] Tying Word Vectors and Word Classifiers (<https://arxiv.org/pdf/1611.01462.pdf>)

                                                                              RNN结构

                                                                          RNN结构

困惑度的计算方式(对交叉熵损失求指数)

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/b15ec0e8-871d-426e-9e5a-063c849d43fd/Untitled.png

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/c6bdeebd-daf0-4d60-b7ce-794fe34ee65d/Untitled.png

需要注意的点

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/f41df077-bac0-4c56-91b8-4b23cf0178ba/Untitled.png

由于使用了截断的方式,因此进行学习时,还是会受到文本长度的限制,只能记忆有限的模式