Skip to content

Instantly share code, notes, and snippets.

@orionpax00
Last active December 9, 2020 04:14
Show Gist options
  • Select an option

  • Save orionpax00/4fbf53ad9f34c73a21e632f5aeec07b3 to your computer and use it in GitHub Desktop.

Select an option

Save orionpax00/4fbf53ad9f34c73a21e632f5aeec07b3 to your computer and use it in GitHub Desktop.
A simple Attention Mechanism for LSTM-CNN Input model🎯

Attention

The Attention mechanism in Deep Learning is based off the concept of directing model's focus, and it pays greater attention to certain factors when processing the data.[1]



Table of Contents

  1. Introduction
  2. Mathematics
  3. Example
  4. Reference

Introduction

Attention Mechanism enables the deep learning model to identify which part of the input has more or which part has less significance for predicting the output. To enable this we define a extra set of functions that captures the importance of the region in Input vector/tensor, then we normalize that state using softmax function.

Mathematics

Attention Mechanism has these three base equation in most of the case you just have to take care of the activation funtion and dimentions.

X : Input Tensor
Y : Output Tensor
W : Weight Matrix

Example

Below is the Naive implementation of Attention mechanism, I created a attention layer as a precursor to the LSTM layer that defines which of the input features and history is important. for complete implementation checkout github repo here
NOTE: Advanced Implementation coming soon...🚄


source) https://arxiv.org/abs/1811.03760

These matrix operation are applied in the AttentionModule layer see code below

Reference

  1. https://arxiv.org/abs/1811.03760
  2. https://www.youtube.com/watch?v=yInilk6x-OY
import tensorflow as tf
class AttentionModule(tf.keras.layers.Layer):
"""
This layer is the implementation of Simple History attention LSTM
"""
def __init__(self, history_size):
super(AttentionModule, self).__init__()
self.history_size = history_size
def build(self, input_shape):
self.kernel = self.add_weight("kernel", shape=[self.history_size, self.history_size])
def call(self, input_tensor):
attended_weights = tf.nn.softmax(tf.matmul(input_tensor, self.kernel))
attended_input = tf.multipy(attended_weights, input_tensor)
return attended_input
class LSTMANN(tf.keras.Model):
"""
base attention lstm model
"""
def __init__(self, input_shape):
super(lstmCNN, self).__init__(name="")
self.input_shape = input_shape
##lstm model
self.lstm1 = tf.keras.layers.LSTM(self.input_shape[-2]*self.input_shape[-1],activation='relu')
self.dense1 = tf.keras.layers.Dense(1)
def call(self,input_tensor):
x = AttentionModule(self.input_shape[-2])(input_tensor)
x = self.lstm1(x)
x = self.dense1(x)
return x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment