Function that activates the particular neuron or node if the value across a particular threshold. These functions add the necessary non-linearity in the ANNs. Each perceptron is, in reality (and traditionally), a logistic regression unit. When N units are stacked on top of each other we get a basic single layer perceptron which serves as the basis of Artificial neural network.
Click here for Google's ML glossary definition
There are different types of activation function and each has its benefits and faults. One of the consideration is the ease in evaluation of the gradient. It should be easy but also help in the final learning process by translating the necessary abstraction and non-linearity across the network. Some of the activation functions are primarily used to model the output of the ANN. Traditionally for a classification task, we would use a sigmoid activation function for a binary classification to predict a binary output (yes/no). In the case of multi-class classification that activation is replaced by softmax activation to estimate the 'probability' across different classes.
Some of the traditionally used Activation functions:
- Sigmoid activaton function
- tanh (hyperbolic tangent) activaton function
- ReLU activaton function
- Leaky ReLU activaton function
- Softplus function
- Softmax function
import numpy as np
import matplotlib.pyplot as plt
%config InlineBackend.figure_format = 'retina'
import seaborn as sns
sns.set_palette("deep")
z = np.linspace(-10,10,100)
def sigmoid(z):
return 1/(1+np.exp(-z))
# derivative of Sigmoid Function
def dsigmoid(a):
return a*(1-a) # returns a derivative od sigmoid function if a=sigmoid then a'=a(1-a)
plt.plot(z, sigmoid(z), label = r'$sigmoid$')
plt.plot(z, dsigmoid(sigmoid(z)), label = r'$ \frac{\partial (sigmoid)}{\partial z}$')
plt.legend(fontsize = 12)
plt.xlabel('z')
plt.show()
import torch
x = torch.tensor(z, requires_grad=True)
print(x.requires_grad)
b = torch.sigmoid(x)
x
b.backward(torch.ones(x.shape))
x.grad
plt.plot(x.data.numpy(), b.data.numpy(), label = r'$sigmoid$')
plt.plot(x.data.numpy(), x.grad.data.numpy(), label = r'$ \frac{\partial (sigmoid)}{\partial z}$')
plt.legend(fontsize = 12)
np.unique(np.round((x.grad.data.numpy() - dsigmoid(sigmoid(z))),4))
def tanh(z):
return np.tanh(z)
# derivative of tanh
def dtanh(a):
return 1-np.power(a,2)
plt.plot(z, tanh(z),'b', label = 'tanh')
plt.plot(z, dtanh(tanh(z)),'r', label=r'$ \frac{dtanh}{dz}$')
plt.legend(fontsize = 12)
plt.show()
def ReLU(z):
return np.maximum(0,z)
# derivative of ReLu
def dReLU(a):
return 1*(a>0)
plt.plot(z, ReLU(z),'b', label ='ReLU')
plt.plot(z, dReLU(ReLU(z)),'r', label=r'$ \frac{dReLU}{dz}$')
plt.legend(fontsize = 12)
plt.xlabel('z')
plt.ylim(0,4)
plt.xlim(-4,4)
plt.show()
def LeakyReLU(z):
return np.maximum(0.01*z,z)
# derivative of ReLu
def dLeakyReLU(a):
return 0.01*(a>0)
plt.plot(z, LeakyReLU(z),'b', label = 'LeakyReLU')
plt.plot(z, dLeakyReLU(LeakyReLU(z)),'r', label=r'$ \frac{dLeakyReLU}{dz}$')
plt.legend(fontsize = 12)
plt.xlabel('z')
plt.ylim(0,4)
plt.xlim(-4,4)
plt.show()
plt.plot(z, dsigmoid(sigmoid(z)),label = r'$ \frac{dsigmoid}{dz}$' )
plt.plot(z, dtanh(tanh(z)), label = r'$ \frac{dtanh}{dz}$')
plt.plot(z, dReLU(ReLU(z)), label=r'$ \frac{dReLU}{dz}$')
plt.plot(z, dLeakyReLU(LeakyReLU(z)), label=r'$ \frac{dLeakyReLU}{dz}$')
plt.legend(fontsize = 12)
plt.xlabel('z')
plt.title('Derivatives of activation functions')
plt.show()