In neural networks, there are so many hyper-parameters that you can play around with and tune the network to get the best results. Some of them are:
- Number of hidden layers
- Number of neurons in each hidden layer
- Activation functions in hidden layers
- Optimizers
- Random initialization of weights and biases
- Batch size
- Learning rate
- Early stopping
- L1 and L2 Regularization
- Dropout
- Momentum
Here’s an example of building a neural network model with two hidden layers using the Sequential API in TensorFlow/Keras:
# Clearing the backend
from tensorflow.keras import backend
backend.clear_session()
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Define the number of neurons in each layer
input_dim = 10 # Example input dimension
hidden1_units = 64
hidden2_units = 32
output_units = 1 # Example output dimension
# Create a Sequential model
model = Sequential()
# Add the first hidden layer with ReLU activation
model.add(Dense(hidden1_units, input_dim=input_dim, activation='relu'))
# Add the second hidden layer with ReLU activation
model.add(Dense(hidden2_units, activation='relu'))
# Add the output layer with linear activation for regression tasks
model.add(Dense(output_units, activation='linear'))
# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
# Print the model summary
model.summary()
Once we are done with the model architecture, we need to compile the model, where we need to provide the loss function that we want to optimize, the optimization algorithm, and the evaluation metric that we are interested in to evaluate the model.
# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
# Print the model summary
model.summary()
For a binary classification task, we will be minimizing the binary_crossentropy and we can choose one optimizer out of:
- SGD
- RMSprop
- Adam
- Adadelta
- Adagrad
- Adamax
- Nadam
- Ftrl
# Compile the model
model_1.compile(loss = 'binary_crossentropy', optimizer='adamax', metrics=['accuracy'])
# Print the model summary
model_1.summary()
To calculate the number of parameters (weights and biases) in a neural network, you need to consider the architecture of the network and the connections between layers. Here’s how you can calculate the number of parameters for a simple feedforward neural network like the one in the example below:
# Define the number of neurons in each layer
input_dim = 10 # Example input dimension
hidden1_units = 64
hidden2_units = 32
output_units = 1 # Example output dimension
Let’s denote:
- \((n_i)\) as the number of neurons in the \((i^{th})\) layer.
- \((n_{i+1})\) as the number of neurons in the \((i+1^{th})\) layer.
For each connection between neurons in adjacent layers:
- Weights: Each connection has a weight associated with it. So, the total number of weights between two layers is equal to the product of the number of neurons in the two layers. For a fully connected layer, this is \((n_i \times n_{i+1})\).
- Biases: Each neuron in a layer (except the input layer) typically has a bias term associated with it. So, the number of biases in a layer is equal to the number of neurons in that layer.
Therefore, to calculate the total number of parameters in a neural network, you sum up the number of weights and biases across all connections and layers.
For example, let’s calculate the number of parameters for the neural network in this example:
- Input layer to first hidden layer:
- Weights: \((10 \times 64)\)
- Biases: (64)
- First hidden layer to second hidden layer:
- Weights: \((64 \times 32)\)
- Biases: (32)
- Second hidden layer to output layer:
- Weights: \((32 \times 1)\)
- Biases: (1)
Adding them up:
\([(10 \times 64) + 64 + (64 \times 32) + 32 + (32 \times 1) + 1 = 896 + 64 + 2048 + 32 + 32 + 1 =3073]\)So, the total number of parameters in this neural network is (3073).