orionpax00/Description.md

## Description.md

      
    Raw
  

              Description.md
            
          
    Attention

The Attention mechanism in Deep Learning is based off the concept of directing model's focus,
and it pays greater attention to certain factors when processing the data.[1]


Table of Contents


Introduction
Mathematics
Example
Reference

Introduction

Attention Mechanism enables the deep learning model to identify which part of the input has more or which part has less significance for predicting the output. To enable this we define a extra set of functions that captures the importance of the region in Input vector/tensor, then we normalize that state using softmax function.
Mathematics

Attention Mechanism has these three base equation in most of the case you just have to take care of the activation funtion and dimentions.

   X : Input Tensor

   Y : Output Tensor

   W : Weight Matrix
   

Example

Below is the Naive implementation of Attention mechanism, I created a attention layer as a precursor to the LSTM layer that defines which of the input features and history is important. for complete implementation checkout github repo here

NOTE: Advanced Implementation coming soon...🚄

  
  source) https://arxiv.org/abs/1811.03760

These matrix operation are applied in the AttentionModule layer see code below

  
Reference


https://arxiv.org/abs/1811.03760
https://www.youtube.com/watch?v=yInilk6x-OY


## LSTM_with_Attention.py
import tensorflow as tf

class AttentionModule(tf.keras.layers.Layer):
    """
    This layer is the implementation of Simple History attention LSTM
    """

    def __init__(self, history_size):
        super(AttentionModule, self).__init__()
        self.history_size = history_size

    def build(self, input_shape):
        self.kernel = self.add_weight("kernel", shape=[self.history_size, self.history_size])

    def call(self, input_tensor):
        attended_weights = tf.nn.softmax(tf.matmul(input_tensor, self.kernel))
        attended_input = tf.multipy(attended_weights, input_tensor)

        return attended_input

class LSTMANN(tf.keras.Model):
    """
        base attention lstm model
    """

    def __init__(self, input_shape):
        super(lstmCNN, self).__init__(name="")
        self.input_shape = input_shape

        ##lstm model
        self.lstm1 = tf.keras.layers.LSTM(self.input_shape[-2]*self.input_shape[-1],activation='relu')
        self.dense1 = tf.keras.layers.Dense(1)

    def call(self,input_tensor):

        x = AttentionModule(self.input_shape[-2])(input_tensor)
        x = self.lstm1(x)
        x = self.dense1(x)

        return x
	import tensorflow as tf

	class AttentionModule(tf.keras.layers.Layer):
	"""
	This layer is the implementation of Simple History attention LSTM
	"""

	def __init__(self, history_size):
	super(AttentionModule, self).__init__()
	self.history_size = history_size

	def build(self, input_shape):
	self.kernel = self.add_weight("kernel", shape=[self.history_size, self.history_size])

	def call(self, input_tensor):
	attended_weights = tf.nn.softmax(tf.matmul(input_tensor, self.kernel))
	attended_input = tf.multipy(attended_weights, input_tensor)

	return attended_input

	class LSTMANN(tf.keras.Model):
	"""
	base attention lstm model
	"""

	def __init__(self, input_shape):
	super(lstmCNN, self).__init__(name="")
	self.input_shape = input_shape

	##lstm model
	self.lstm1 = tf.keras.layers.LSTM(self.input_shape[-2]*self.input_shape[-1],activation='relu')
	self.dense1 = tf.keras.layers.Dense(1)

	def call(self,input_tensor):

	x = AttentionModule(self.input_shape[-2])(input_tensor)
	x = self.lstm1(x)
	x = self.dense1(x)

	return x
No results found