Types Of Activation Functions in Deep Neural Network.

6 min readJan 28, 2021

What Is Deep Learning?

Deep learning is an artificial intelligence function that imitates the workings of the human brain in processing data and creating patterns for use in decision making. Deep learning is a subset of machine learning in artificial intelligence that has networks capable of learning unsupervised from data that is unstructured or unlabeled. Also known as deep neural learning or deep neural network.

Structure of Deep learning

A neural network comes with a layered design that contains an input layer, a hidden layer, and an output layer. It functions like the human brain’s neurons such as receiving inputs, processing them, and generating output. There’re several types of artificial neural networks that are implemented based on a set of parameters needed to determine the output and mathematical operations. The functions of these neural networks are utilized in deep learning which helps in image recognition, speech recognition, among others.

What is Neuron in hidden layers?

Within an artificial neural network, a neuron is a mathematical function that models the functioning of a biological neuron. Typically, a neuron computes the weighted average of its input, and this sum is passed through a nonlinear function, often called activation function, such as the sigmoid.

Purpose of Activation Functions

In a neural network, numeric data points, called inputs, are fed into the neurons in the input layer. Each neuron has a weight, and multiplying the input number with the weight gives the output of the neuron, which is transferred to the next layer. The activation function is a mathematical “gate” in between the input feeding the current neuron and its output going to the next layer. It can be as simple as a step function that turns the neuron output on and off, depending on a rule or threshold. Or it can be a transformation that maps the input signals into output signals that are needed for the neural network to function. Increasingly, neural networks use non-linear activation functions, which can help the network learn complex data, compute and learn almost any function representing a question, and provide accurate predictions.

why we have a lot of activation functions?

We using a lot of activation functions to perform neural networks. The activation function decides whether a neuron should be activated or not by calculating the weighted sum and further adding bias with it. we using various activation functions for different weights. The main aim is to reduce the loss functions. the loss function says us whether the predicted value is correct or not. To achieve this we have various activation functions lets have a look at them.

Sigmoid Activation function

function : A = 1/(1 + e-x), where x= ∑_(i=1)^n〖wi.xi+bi〗

Sigmoid function usually used in the output layer of binary classification, where the result is either 0 or 1, as the value for sigmoid function lies between 0 and 1 only so, the result can be predicted easily to be 1 if the value is greater than 0.5 and 0 otherwise. sometime this function will not work at a biased dataset, which may lead to vanished gradient decent problem. The function is differentiable. That means, we can find the slope of the sigmoid curve at any two points.

Derivative of sigmoid function:

derivation of sigmoid function values lies between 0 to 0.25. it affects the training process called backpropagation. as a result the number of hidden layers increases then the weight updations through backpropagation getting remain the same, which means there is no use for updations. this is called the Vanishing Gradient problem

ReLU (Rectified Linear Unit) Activation function

The Rectified Linear Unit is the most commonly used activation function in deep learning models. The function returns 0 if it receives any negative input, but for any positive value Z, it returns that value back. So it can be written as f(Z)=max(0, Z). ReLU was computationally effective it allows the gradient to converge quickly. It has its own derivative functions that allow backpropagation.

the main drawback of using ReLU was it lead to the Dying ReLU problem which means when inputs approach zero or any negative value, the function returns zero so that the gradient of the function becomes zero, the network cannot perform backpropagation and cannot learn.

One of its limitations is that it should only be used within Hidden layers of a Neural Network Model.
Some gradients can be fragile during training and can die. It can cause a weight update which will make it never activate on any data point again. Simply saying that ReLu could result in Dead Neurons.

Leaky ReLU:

It similar to ReLU but it returns some amount of minimum value instead of zero. By using leaky ReLU we can avoid the problem of dying Relu. The derivative of the Leaky ReLU is 1 in the positive part and is a small fraction in the negative part.

it can’t be used for complex Classification. It lags behind the Sigmoid and Tanh for some of the use cases,

Exponential Linear Unit (ELU)

Exponential Linear Unit or its widely known name ELU is a function that tends to converge cost to zero faster and produce more accurate results. It has a mean activation that is close to 0 and it is an exponential function, ELU has an extra alpha constant which should be a positive number.

Here alpha is a parameter — here alpha is a parameter

It helps to push the mean activation of neurons closer to zero which is beneficial for learning and it helps to learn representations that are more robust to noise.

ELU becomes smooth slowly until its output equal to -α whereas RELU sharply smoothes.
ELU is a strong alternative to ReLU.
Unlike ReLU, ELU can produce negative outputs.

Softmax activation function:

Softmax function calculates the probabilities distribution of the event over ’n’ different events. In general way of saying, this function will calculate the probabilities of each target class over all possible target classes and return the target class having the highest probability. the calculated probabilities will be helpful for determining the target class for the given inputs.

Tanh activation function:

Tanh activation function also known as Hyperbolic tangent function.Tanh squashes a real-valued number to the range [-1, 1]. It’s non-linear. But unlike Sigmoid, its output is zero-centered. Therefore, in practice, the tanh non-linearity is always preferred to the sigmoid nonlinearity

The gradient is stronger for tanh than sigmoid, it also cannot solve the vanishing gradient problem, this function can produce some dead neurons during the computation process.

Conclusion:

Today, activation functions like ReLU and ELU have gained maximum attention since they help to eliminate the vanishing gradient problem, but in a real-world scenario, we face the various problem for that we use various functions based on problem

references:

#https://www.upgrad.com/blog/types-of-activation-function-in-neural-networks/

#wikipedia

#https://ml-cheatsheet.readthedocs.io/en/latest/activation_functions.html

#https://images.app.goo.gl/8dQxUDrJ9w6yxNVT9

#https://images.app.goo.gl/kT2C5kT78EhCAM75A