（七）Self-Attention 是咋回事儿？

告你什么

A place to tell the truth, the whole truth and nothing but the truth.

首页
归档
分类
标签
关于

nlp

发布日期: 2021-01-16

文章字数: 215

阅读时长: 1 分

阅读次数:

不局限于 seq2seq 模型，Self-Attention 思想可在原地计算模型应该关注的地方（Context Vector）。

SimpleRNN 是用 $\bold h_0$ 和 $\bold x_1$ 计算 $\bold h_1$ ，而 Self-Attention 是使用 $\bold x_1$ 和 $\bold c_0$ 计算 $\bold h_1$ ：

\bold h_1 =\tanh(\bold A\cdot \begin{bmatrix} \bold x_1\\\bold c_0 \end{bmatrix} +\bold b )

当然 $\bold h_0$ 此时是零向量，所以 $\bold c_1 = \bold h_1$

再往下，由 $\bold c_1$ 和 $\bold x_2$ 计算出 $\bold h_2$

再由 $\bold h_2$ 和每一个 $\bold h_i$ 计算出 $\bold \alpha _ i$ 向量

再由 $\bold \alpha$ 和 $\bold h$ 向量点乘的到 $\bold c_2$

重复这个过程，直到所有输入都计算一遍。

总结：

有了自注意力机制，RNN更不容易遗忘
对于新的输入，会去关注相关的上下文

转载规则

《（七）Self-Attention 是咋回事儿？》由 Harbor Zeng 采用知识共享署名 4.0 国际许可协议进行许可。

（八）LSTM-Attention 实现机器翻译

2021-01-17 nlp

lstm 机器翻译 Attention

（六）Attention 是咋回事儿？

Seq2Seq 的局限 Seq2Seq 仍有记忆问题，当待翻译的句子长度较长时，它会遗忘较早的。 Attention 加上 attention，Seq2Seq 不会忘了原始输入，Decoder 每次生成时都回去重新看一遍 Encoder 的所有输入（计算一番），知道要额外关注哪些词，效果很好，但是带来了更多的计算。 Encoder 的最终输出的 hm\bold h_mhm 同时也是 D

2021-01-16 nlp

rnn seq2seq Attention

投食