A neural network is a network of neurons. A neuron is a mathematical function which transforms the input data elements into a single output value.
In the above diagram the single neuron applies a given weight-age and a bias to each input to come up with the output value.
If we represent the neuron bias as another input to neuron then the above equation can be simplified as
since b1 is another weight it can be represented as
Now representing every weight as Theta
Simple version of this equation look like just
Y = wX + b
w = weight = This value will determine the influence input data has on the output product
b = bias = Bias is the difference between the predicted value and the expected value.
Both these values are called trainable data, and these are the variables for which we change the values to come up with final model.
In a single layer of a deep learning neural network, there will by multiple neurons. The whole layer output will be as the same number of neurons in that layer.
Assume this is a simple neural network in tensor flow
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.optimizers import Adam
# Setup the model
model = Sequential()
model.add(Dense(units=20, activation='relu'))
model.add(Dense(units=20, activation='relu'))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy')
This is how we feed the data in the model and optimize the weights
import pandas as pd
X = pd.DataFrame(data[0])
y = pd.DataFrame(data[1])
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)
import pandas as pd
X = pd.DataFrame(data[0])
y = pd.DataFrame(data[1])
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)
# Train the model
history = model.fit(x=X_train, y=y_train,
epochs=200,
batch_size=256,
shuffle=True,
validation_data=(X_test, y_test),
verbose=1)
When we feed the input data into a neural network, we give some random values to the weight and biases for each node. Then we apply gradient decent algorithm to modify the values of trainable variables to fix the predictions.
When we create the model we set 2 parameters.
Optimizer: An optimizer is a function or an algorithm that adjusts the attributes of the neural network, such as weights and learning rates. Thus, it helps in reducing the overall loss and improving accuracy. It has a attribute of learning rate which is a configurable hyperparameter used in the training of neural networks that has a small positive value, often in the range between 0.0 and 1.0. The learning rate controls how quickly the model is adapted to the problem and also avoid local minimums.
Loss function: A loss function measures how good a neural network model is in performing a certain task, which in most cases is regression or classification. We must minimize the value of the loss function during the backpropagation step in order to make the neural network better. It should punish big mistakes or ignore big mistakes in inputs depending on the use case, so different loss functions are used in different scenarios.
Regression Loss Functions: used in regression neural networks; given an input value, the model predicts a corresponding output value (rather than pre-selected labels); Ex. Mean Squared Error, Mean Absolute Error
Classification Loss Functions: used in classification neural networks; given an input, the neural network produces a vector of probabilities of the input belonging to various preset categories — can then select the category with the highest probability of belonging; Ex. Binary Cross-Entropy, Categorical Cross-Entropy
Back-propagation: is a gradient estimation method used to train neural network models. The gradient estimate is used by the optimization algorithm to compute the network parameter updates. Back-propagation computes the gradient of a loss function with respect to the weights of the network for a single input–output example, and does so efficiently, computing the gradient one layer at a time, iterating backward from the last layer to avoid redundant calculations of intermediate terms in the chain rule.
During the learning process, the value of weights is manipulated using this formula.
So we can write our own function to do the same thing, or maybe some better optimization. The GradientTape helps us record the partial derivatives and apply it to optimizer.
EPOCHS=10
def custom_fit(model, EPOCH, train_dataset):
for epoch in range(EPOCHS):
print("Training starts for epoch number {}".format(epoch+1))
for (x_batch, y_batch) in train_dataset:
with tf.GradientTape() as recorder:
y_pred = model(x_batch, training = True)
loss = BinaryCrossentropy(y_batch, y_pred)
//This step is the derivative of loss over the derivative of current weight
partial_derivatives = recorder.gradient(loss, model.trainable_weights)
//This will apply the full equation above subtracting it from current value
Adam(learning_rate = 0.01).apply_gradients(zip(partial_derivatives, model.trainable_weights))
print("Training Complete!!!!")
Then we can just call this method to train this model
custom_fit(model, EPOCH, (X_train, y_train))
Cheers !!
– Amit Tomar