Deep Learning Interview Questions


Beginners To Experts


The site is under development.

Deep Learning Interview Questions

Deep learning is a subset of machine learning that uses multi-layered neural networks to automatically learn hierarchical feature representations from data. Unlike traditional ML, deep learning can handle large, complex datasets and automatically extract features without manual engineering.

# Simple deep learning example using Keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(100,)))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy')

A neural network is a computational model inspired by the human brain, consisting of layers of interconnected nodes (neurons). Its main components are the input layer, hidden layers, output layer, weights, biases, and activation functions.

# Define a simple feedforward neural network in PyTorch
import torch
import torch.nn as nn
class SimpleNN(nn.Module):
  def __init__(self):
    super(SimpleNN, self).__init__()
    self.fc1 = nn.Linear(100, 64)
    self.relu = nn.ReLU()
    self.fc2 = nn.Linear(64, 10)
  def forward(self, x):
    x = self.relu(self.fc1(x))
    x = self.fc2(x)
    return x

Backpropagation is an algorithm for training neural networks by computing gradients of the loss function with respect to the weights. It uses the chain rule to propagate errors backward through the network, enabling weight updates to minimize loss.

# Backpropagation example using PyTorch
import torch
import torch.nn as nn
model = nn.Linear(10, 1)
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
inputs = torch.randn(5, 10)
targets = torch.randn(5, 1)
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward() # compute gradients
optimizer.step() # update weights

Activation functions introduce non-linearity into neural networks, enabling them to learn complex patterns. Common activation functions include ReLU, Sigmoid, and Tanh.

# Using activation functions in TensorFlow Keras
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.models import Sequential
model = Sequential()
model.add(Dense(64, input_shape=(100,)))
model.add(Activation('relu'))
model.add(Dense(1))
model.add(Activation('sigmoid'))

Overfitting occurs when a model learns noise or irrelevant details in training data, resulting in poor generalization to new data. Prevent it using techniques like dropout, regularization, early stopping, and data augmentation.

# Example: Dropout layer in Keras
from tensorflow.keras.layers import Dropout
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(100,)))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

CNNs are specialized neural networks designed for processing grid-like data such as images. They use convolutional layers to automatically learn spatial hierarchies of features.

# Simple CNN example with Keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28,28,1)))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))

RNNs are designed for sequential data like time series or text. They use loops to maintain a memory of previous inputs, solving problems where context matters.

# Simple RNN example in Keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense
model = Sequential()
model.add(SimpleRNN(50, input_shape=(10, 1)))
model.add(Dense(1))

The vanishing gradient problem occurs during backpropagation when gradients become very small, preventing effective weight updates and slowing or stopping learning, especially in deep or recurrent networks.

# Using ReLU helps mitigate vanishing gradients
from tensorflow.keras.layers import Dense, Activation
model = Sequential()
model.add(Dense(64, input_shape=(100,)))
model.add(Activation('relu'))
model.add(Dense(10, activation='softmax'))

Dropout randomly disables a fraction of neurons during training, preventing over-reliance on any one neuron and reducing overfitting by encouraging more robust feature learning.

# Applying dropout in Keras
from tensorflow.keras.layers import Dropout
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(100,)))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

Batch normalization normalizes layer inputs during training, stabilizing learning, accelerating convergence, and improving overall performance.

# Batch normalization example in Keras
from tensorflow.keras.layers import BatchNormalization, Dense
model = Sequential()
model.add(Dense(64, input_shape=(100,)))
model.add(BatchNormalization())
model.add(Dense(10, activation='softmax'))

Transfer learning uses a pre-trained model on a new but related task, saving training time and improving performance when data is limited.

# Transfer learning example with Keras
from tensorflow.keras.applications import VGG16
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224,224,3))
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(Dense(10, activation='softmax'))

Supervised deep learning uses labeled data to learn input-output mappings, while unsupervised deep learning finds patterns or representations from unlabeled data.

# Example: Autoencoder for unsupervised learning in Keras
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
input_layer = Input(shape=(100,))
encoded = Dense(32, activation='relu')(input_layer)
decoded = Dense(100, activation='sigmoid')(encoded)
autoencoder = Model(input_layer, decoded)
autoencoder.compile(optimizer='adam', loss='mse')

Autoencoders are neural networks trained to reconstruct input data, used for tasks like dimensionality reduction, anomaly detection, and data denoising.

# Simple autoencoder in Keras
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
input_layer = Input(shape=(784,))
encoded = Dense(64, activation='relu')(input_layer)
decoded = Dense(784, activation='sigmoid')(encoded)
autoencoder = Model(input_layer, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

Gradient descent is an optimization algorithm that updates model weights iteratively to minimize the loss function by moving in the direction of the negative gradient.

# Gradient descent optimization in PyTorch
import torch
import torch.nn as nn
model = nn.Linear(10, 1)
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
inputs = torch.randn(5, 10)
targets = torch.randn(5, 1)
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()

Loss functions measure how well a model's predictions match the target values. They guide the training process by providing feedback for optimization.

# Example of using loss functions in Keras
from tensorflow.keras.losses import SparseCategoricalCrossentropy
model.compile(optimizer='adam', loss=SparseCategoricalCrossentropy())

Early stopping monitors validation loss during training and stops training when loss stops improving, preventing the model from overfitting.

# Early stopping callback in Keras
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss', patience=3)
model.fit(x_train, y_train, validation_data=(x_val, y_val), epochs=100, callbacks=[early_stop])

Optimizers update the network weights during training by using calculated gradients to minimize the loss function, affecting convergence speed and accuracy.

# Using Adam optimizer in Keras
model.compile(optimizer='adam', loss='categorical_crossentropy')

Batch gradient descent uses the entire training dataset to compute gradients, while stochastic gradient descent uses one sample at a time, which can speed up training but adds noise.

# Example: SGD optimizer in PyTorch
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

The loss landscape is a visualization of the loss function values over different model parameter configurations, showing valleys (good minima) and peaks (bad solutions).

# Visualization typically done with tools like matplotlib
# (no simple code snippet for this)

GANs consist of two networks — a generator that creates fake data and a discriminator that tries to distinguish fake from real data. They compete in a game to improve data generation.

# GAN skeleton code in PyTorch (simplified)
import torch
import torch.nn as nn
class Generator(nn.Module):
  def __init__(self):
    super().__init__()
    self.fc = nn.Linear(100, 784)
  def forward(self, x):
    return torch.sigmoid(self.fc(x))
class Discriminator(nn.Module):
  def __init__(self):
    super().__init__()
    self.fc = nn.Linear(784, 1)
  def forward(self, x):
    return torch.sigmoid(self.fc(x))

Transfer learning uses a pretrained model on a new task by reusing learned features, speeding up training and improving performance especially on small datasets.

# Example: Using pretrained ResNet in PyTorch
import torchvision.models as models
resnet = models.resnet18(pretrained=True)
# Freeze layers
for param in resnet.parameters():
param.requires_grad = False
# Replace last layer for new task
import torch.nn as nn
resnet.fc = nn.Linear(resnet.fc.in_features, 10) # 10 classes

Dropout randomly disables neurons during training to prevent co-adaptation and reduce overfitting, improving model generalization.

# Dropout example in Keras
from tensorflow.keras.layers import Dropout
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5)) # 50% dropout rate

Batch normalization normalizes inputs of each layer to stabilize and accelerate training by reducing internal covariate shift.

# Batch normalization in PyTorch
import torch.nn as nn
bn = nn.BatchNorm1d(num_features=128)
# Apply after linear layer

The vanishing gradient problem occurs when gradients become very small during backpropagation, slowing learning in early layers of deep networks.

# Using ReLU activation helps mitigate vanishing gradients
import torch.nn.functional as F
x = F.relu(x)

CNNs use convolutional layers to detect spatial features by sliding filters over input data, capturing local patterns and reducing parameters.

# CNN layer example in Keras
from tensorflow.keras.layers import Conv2D
model.add(Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)))

RNNs process sequential data by maintaining a hidden state that captures information from previous inputs, useful for time series and language.

# Simple RNN in Keras
from tensorflow.keras.layers import SimpleRNN
model.add(SimpleRNN(50, input_shape=(timesteps, features)))

LSTMs are a type of RNN designed to learn long-term dependencies using gates to control information flow and prevent vanishing gradients.

# LSTM layer in Keras
from tensorflow.keras.layers import LSTM
model.add(LSTM(100, input_shape=(timesteps, features)))

Transformers use self-attention mechanisms to weigh input parts differently, enabling parallel processing of sequences and superior performance on NLP tasks.

# Using HuggingFace Transformer model
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModel.from_pretrained('bert-base-uncased')

Self-attention computes the importance of each word relative to others in a sequence, allowing the model to capture context efficiently.

# Simplified self-attention computation
import torch
query = torch.rand(1,5,64) # batch, seq_len, dim
key = torch.rand(1,5,64)
value = torch.rand(1,5,64)
scores = torch.matmul(query, key.transpose(-2,-1)) / (64**0.5)
weights = torch.nn.functional.softmax(scores, dim=-1)
output = torch.matmul(weights, value)

Attention allows models to focus on relevant parts of input sequences dynamically, improving performance especially in NLP and vision tasks.

# Attention example pseudo-code
context_vector = sum(attention_weights * encoder_outputs)

Fine-tuning updates pretrained model weights on a new dataset by training some or all layers to better adapt to the specific task.

# Fine-tuning last layers in PyTorch
for param in model.parameters():
param.requires_grad = False
for param in model.fc.parameters():
param.requires_grad = True
# Train only last fc layer

Data augmentation artificially increases training data by applying transformations like rotation or flipping, improving model robustness and preventing overfitting.

# Data augmentation example in Keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(rotation_range=20, horizontal_flip=True)

Gradient clipping limits gradient values during backpropagation to prevent exploding gradients, stabilizing training.

# Gradient clipping example in PyTorch
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

Early stopping halts training once validation loss stops improving for a set patience, avoiding overfitting.

# EarlyStopping in Keras
from tensorflow.keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(monitor='val_loss', patience=5)
model.fit(X_train, y_train, validation_data=(X_val, y_val), callbacks=[early_stopping])

An epoch is one complete pass through the entire training dataset; multiple epochs are used to iteratively improve the model.

# Training with epochs in Keras
model.fit(X_train, y_train, epochs=10)

Overfitting happens when a model learns noise instead of patterns, performing well on training but poorly on unseen data. Prevent with dropout, regularization, and early stopping.

# Example of L2 regularization in Keras
from tensorflow.keras.regularizers import l2
model.add(Dense(64, kernel_regularizer=l2(0.01)))

The learning rate controls the step size during weight updates; too high causes divergence, too low slows learning.

# Setting learning rate in PyTorch
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

Regularization adds constraints or penalties to reduce model complexity and prevent overfitting, like L1/L2 penalties and dropout.

# L1 regularization example in Keras
from tensorflow.keras.regularizers import l1
model.add(Dense(64, kernel_regularizer=l1(0.01)))

Batch size is the number of samples processed before the model updates weights; mini-batch refers to smaller subsets of the dataset used in batch gradient descent.

# Example: batch_size=32 in Keras
model.fit(X_train, y_train, batch_size=32)

The optimizer updates model weights based on computed gradients to minimize loss; popular optimizers include SGD, Adam, and RMSprop.

# Using Adam optimizer in TensorFlow
model.compile(optimizer='adam', loss='categorical_crossentropy')

A loss function measures how well the model's predictions match the true data, guiding optimization to reduce errors.

# Example loss functions
# For classification: categorical_crossentropy
# For regression: mean_squared_error

Supervised learning trains on labeled data with known outputs; unsupervised learning finds patterns in unlabeled data.

# Supervised example: classification
# Unsupervised example: clustering

Reinforcement learning trains agents to make sequences of decisions by rewarding desired actions and punishing undesired ones.

# Simplified RL loop pseudocode
state = env.reset()
while not done:
  action = agent.act(state)
  next_state, reward, done = env.step(action)
  agent.learn(state, action, reward, next_state)
  state = next_state

AI is the broad field of making machines intelligent; machine learning is a subset where models learn from data; deep learning is a subset of ML using deep neural networks.

# Deep learning example: neural networks
# ML example: decision trees
# AI includes rule-based systems too

The curse of dimensionality refers to problems caused by high-dimensional data, where data becomes sparse and learning harder.

# Dimensionality reduction methods:
# PCA, t-SNE, UMAP

Autoencoders are neural networks trained to reconstruct input data, learning compressed representations useful for denoising or dimensionality reduction.

# Autoencoder example in Keras
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
input_img = Input(shape=(784,))
encoded = Dense(64, activation='relu')(input_img)
decoded = Dense(784, activation='sigmoid')(encoded)
autoencoder = Model(input_img, decoded)

GANs consist of a generator and discriminator network competing to generate realistic data; used for image synthesis, style transfer, etc.

# GAN simplified flow pseudocode
# Generator creates fake samples
# Discriminator tries to distinguish real vs fake

A policy defines the agent’s behavior, mapping states to actions, which can be deterministic or stochastic.

# Policy example pseudocode
def policy(state):
  return action

Backpropagation computes gradients of loss with respect to model weights using chain rule, enabling gradient descent optimization.

# Pseudocode for backpropagation
loss.backward()
optimizer.step()

A confusion matrix shows true vs predicted classification results, helping evaluate performance like accuracy, precision, recall.

# Confusion matrix example using sklearn
from sklearn.metrics import confusion_matrix
y_true = [0, 1, 0, 1]
y_pred = [0, 0, 0, 1]
cm = confusion_matrix(y_true, y_pred)
print(cm)

Overfitting happens when a model learns training data too well, including noise, causing poor generalization to new data.

# Example: High train accuracy but low test accuracy

Use techniques like regularization (L1/L2), dropout, early stopping, data augmentation, and simpler models.

# Example of dropout in Keras
from tensorflow.keras.layers import Dropout
model.add(Dropout(0.5))

Underfitting occurs when a model is too simple to capture underlying data patterns, leading to poor performance on training and test data.

# Example: low train and test accuracy

Regularization adds penalties to loss function to discourage complex models and reduce overfitting.

# L2 regularization example in scikit-learn
from sklearn.linear_model import Ridge
model = Ridge(alpha=1.0)

Cross-validation splits data into subsets to train and validate the model multiple times for reliable performance estimates.

# K-Fold CV in scikit-learn
from sklearn.model_selection import KFold
kf = KFold(n_splits=5)

Bias is error from oversimplifying; variance is error from sensitivity to training data. Tradeoff balances underfitting and overfitting.

# Model complexity affects bias and variance

Activation functions add non-linearity allowing neural networks to learn complex patterns; examples: ReLU, Sigmoid, Tanh.

# Example: ReLU in TensorFlow
from tensorflow.keras.layers import ReLU
layer = ReLU()

Vanishing gradients happen when gradients become too small during backpropagation, slowing learning in deep networks.

# ReLU helps mitigate vanishing gradients compared to Sigmoid

Dropout randomly disables neurons during training to prevent co-adaptation and reduce overfitting.

# Dropout example in Keras
from tensorflow.keras.layers import Dropout
model.add(Dropout(0.5))

Batch normalization normalizes layer inputs to stabilize learning and improve training speed.

# BatchNorm example in TensorFlow
from tensorflow.keras.layers import BatchNormalization
model.add(BatchNormalization())

CNNs are specialized neural networks for processing grid-like data such as images, using convolutional layers to detect features.

# Simple CNN layer example in Keras
from tensorflow.keras.layers import Conv2D
model.add(Conv2D(32, (3,3), activation='relu'))

Pooling reduces spatial dimensions of feature maps, helping reduce computation and control overfitting.

# MaxPooling example in Keras
from tensorflow.keras.layers import MaxPooling2D
model.add(MaxPooling2D(pool_size=(2, 2)))

RNNs process sequential data by maintaining a hidden state to capture temporal dependencies.

# Simple RNN layer example in Keras
from tensorflow.keras.layers import SimpleRNN
model.add(SimpleRNN(50))

LSTMs are a type of RNN designed to overcome the vanishing gradient problem with gates controlling information flow.

# LSTM example in Keras
from tensorflow.keras.layers import LSTM
model.add(LSTM(50))

Transformers use self-attention mechanisms to process sequential data efficiently, enabling parallelization and better long-range context.

# Transformer attention pseudocode
# Attention(Q, K, V) = softmax(QK^T / sqrt(d_k)) V

Attention weights parts of input differently, allowing models to focus on important information dynamically.

# Scaled dot-product attention example

Transfer learning reuses a pre-trained model on a new task, saving training time and improving performance with limited data.

# Using pre-trained model in Keras
from tensorflow.keras.applications import VGG16
base_model = VGG16(weights='imagenet', include_top=False)

Fine-tuning adjusts some layers of a pre-trained model with new data to better fit the specific task.

# Freeze base layers and train top layers example
for layer in base_model.layers:
  layer.trainable = False

Data augmentation artificially increases training data by applying transformations like rotations, flips, and shifts.

# Example in Keras ImageDataGenerator
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(rotation_range=20, horizontal_flip=True)

Classification predicts discrete labels; regression predicts continuous numeric values.

# Classification example: spam detection
# Regression example: house price prediction

Precision is correct positive predictions over all positive predictions; recall is correct positive over all actual positives; F1 is their harmonic mean.

# Example in scikit-learn
from sklearn.metrics import precision_score, recall_score, f1_score

ROC curve plots true positive rate vs false positive rate; AUC measures overall model performance.

# Example in scikit-learn
from sklearn.metrics import roc_curve, auc

Gradient descent is an optimization algorithm minimizing loss by iteratively updating parameters in the negative gradient direction.

# Simple gradient descent example in Python
learning_rate = 0.01
weights -= learning_rate * gradient

SGD updates parameters using one or few samples per iteration, allowing faster but noisier convergence.

# SGD example with mini-batches

Optimizers improve model training by adjusting parameters efficiently; examples: SGD, Adam, RMSprop.

# Using Adam optimizer in Keras
from tensorflow.keras.optimizers import Adam
model.compile(optimizer=Adam(), loss='categorical_crossentropy')

Early stopping halts training when validation loss stops improving, preventing overfitting.

# EarlyStopping callback in Keras
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss', patience=3)

Confusion matrix summarizes classification results with counts of TP, TN, FP, and FN.

# Confusion matrix in scikit-learn
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_true, y_pred)

Reinforcement learning trains agents to make sequential decisions by maximizing cumulative rewards.

# Simple Q-learning pseudocode
Q(s, a) = Q(s, a) + α [r + γ max_a' Q(s', a') - Q(s, a)]

Policy-based methods learn a policy directly; value-based methods estimate value functions to derive policies.

# Example: REINFORCE (policy-based) vs Q-learning (value-based)

NLP enables machines to understand, interpret, and generate human language.

# Example: tokenization in Python using NLTK
import nltk
tokens = nltk.word_tokenize("Hello world!")

Word embeddings represent words as dense vectors capturing semantic relationships.

# Example: Word2Vec using Gensim
from gensim.models import Word2Vec
model = Word2Vec(sentences, vector_size=100, window=5)

Seq2seq models map input sequences to output sequences, commonly used in translation and chatbots.

# Example architecture: Encoder-Decoder

Tokenization splits text into words or subwords for processing.

# Example using spaCy
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Hello world!")
tokens = [token.text for token in doc]

NER detects and classifies named entities like people, organizations, and locations in text.

# Example using spaCy
for ent in doc.ents:
  print(ent.text, ent.label_)

Sentiment analysis classifies text into positive, negative, or neutral opinions.

# Example using TextBlob
from textblob import TextBlob
analysis = TextBlob("I love this!")
print(analysis.sentiment.polarity)

Topic modeling discovers abstract topics in documents; popular method: Latent Dirichlet Allocation (LDA).

# Example using Gensim LDA
from gensim.models.ldamodel import LdaModel

Word sense disambiguation identifies the correct meaning of a word based on context.

# Example approach: Lesk algorithm

Sequence labeling assigns categorical labels to each token in a sequence, e.g., POS tagging or NER.

# Example: POS tagging

Language models predict the next word or token in a sequence based on context.

# Example: GPT predicts next token

TF-IDF scores words by importance, balancing term frequency with how rare they are across documents.

# TF-IDF in scikit-learn
from sklearn.feature_extraction.text import TfidfVectorizer

Supervised learning uses labeled data; unsupervised learning finds patterns in unlabeled data.

# Examples: classification (supervised), clustering (unsupervised)

Clustering groups similar data points without labels, e.g., K-means clustering.

# Example: KMeans clustering
from sklearn.cluster import KMeans

Dimensionality reduction reduces features while preserving structure, e.g., PCA.

# PCA example
from sklearn.decomposition import PCA

Feature engineering creates meaningful features from raw data to improve model performance.

# Example: creating new features from date columns

Overfitting occurs when a model learns noise, reducing generalization; prevent by regularization, dropout.

# Example: dropout in Keras
from tensorflow.keras.layers import Dropout

Regularization adds penalty to loss to reduce complexity; common types: L1, L2.

# L2 regularization in Keras
from tensorflow.keras.regularizers import l2

Batch normalization normalizes layer inputs to speed training and improve stability.

# BatchNormalization layer in Keras
from tensorflow.keras.layers import BatchNormalization

Dropout randomly disables neurons during training to prevent overfitting.

# Dropout layer example
from tensorflow.keras.layers import Dropout

Cross-validation splits data into training and testing sets multiple times to assess model generalization.

# KFold cross-validation
from sklearn.model_selection import KFold

Hyperparameter tuning finds the best model parameters, often using grid or random search.

# Example with GridSearchCV
from sklearn.model_selection import GridSearchCV

This example builds and trains a simple neural network for digit classification using TensorFlow and the MNIST dataset.

import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.utils import to_categorical

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train / 255.0
x_test = x_test / 255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

model = Sequential([
Flatten(input_shape=(28, 28)),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

This example builds and trains a simple neural network for digit classification using TensorFlow and the MNIST dataset.

import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.utils import to_categorical

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train / 255.0
x_test = x_test / 255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

model = Sequential([
Flatten(input_shape=(28, 28)),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

Dropout is a regularization technique to prevent overfitting in neural networks by randomly setting some units to zero during training.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train / 255.0
x_test = x_test / 255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

model = Sequential([
Flatten(input_shape=(28, 28)),
Dense(128, activation='relu'),
Dropout(0.2),
Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

Convolutional layers are the core of CNNs, ideal for image data. This example uses convolutional and pooling layers to build a CNN for MNIST digit classification.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(-1, 28, 28, 1) / 255.0
x_test = x_test.reshape(-1, 28, 28, 1) / 255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

model = Sequential([
Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)),
MaxPooling2D(pool_size=(2, 2)),
Flatten(),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

This example demonstrates a basic training loop in PyTorch including training and evaluation steps. Each line is explained using inline comments.

for epoch in range(num_epochs):  # Loop over the dataset multiple times (epochs)
model.train() # Set the model to training mode
for inputs, labels in train_loader: # Iterate over batches from the training set
optimizer.zero_grad() # Clear gradients from the previous step
outputs = model(inputs) # Perform forward pass to get predictions
loss = criterion(outputs, labels) # Compute the loss between predictions and true labels
loss.backward() # Backpropagate the loss to compute gradients
optimizer.step() # Update model weights using the optimizer

model.eval() # Switch to evaluation mode (e.g., disables dropout)
with torch.no_grad(): # Disable gradient computation for evaluation
# Evaluate on validation set
# Typically: loop through val_loader, run model(inputs), compute accuracy/loss
pass # Placeholder where validation code would go

This example builds a basic autoencoder with one hidden layer using Keras. Autoencoders are useful for unsupervised learning tasks like dimensionality reduction or denoising.

from tensorflow.keras import layers, models  # Import necessary Keras components

input_img = layers.Input(shape=(784,)) # Input layer for flattened 28x28 image (MNIST)
encoded = layers.Dense(64, activation='relu')(input_img) # Encoding layer with 64 units and ReLU activation
decoded = layers.Dense(784, activation='sigmoid')(encoded) # Decoding layer with sigmoid activation to output 784 values

autoencoder = models.Model(input_img, decoded) # Define the autoencoder model from input to reconstructed output
autoencoder.compile(optimizer='adam', loss='binary_crossentropy') # Compile model with Adam optimizer and binary crossentropy loss

This example demonstrates building a simple Convolutional Neural Network (CNN) in Keras for classifying grayscale images like MNIST digits.

from tensorflow.keras import layers, models  # Import the layers and models modules from Keras

model = models.Sequential([ # Define a sequential model
layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)), # 32 filters, 3x3 kernel, ReLU activation, grayscale input
layers.MaxPooling2D((2,2)), # Downsample with a 2x2 max pooling layer
layers.Flatten(), # Flatten the 2D feature maps to 1D
layers.Dense(64, activation='relu'), # Fully connected layer with 64 neurons and ReLU
layers.Dense(10, activation='softmax') # Output layer for 10 classes with softmax activation
])

model.compile( # Compile the model with appropriate configurations
optimizer='adam', # Use Adam optimizer
loss='sparse_categorical_crossentropy', # Suitable for integer-labeled classification
metrics=['accuracy'] # Track accuracy during training
)

This code snippet demonstrates how to freeze all layers of a PyTorch model except the final fully connected (FC) layer. It's commonly used in transfer learning when fine-tuning pre-trained models like ResNet.

# Freeze all layers in the model
for param in model.parameters():
param.requires_grad = False # Disables gradient computation for each parameter

# Unfreeze only the final fully connected layer for param in model.fc.parameters():
param.requires_grad = True # Enables training only for the FC layer
Explanation:
- The first loop disables gradient updates for **all** model parameters (useful when using pre-trained models).
- The second loop **re-enables** training (backpropagation) only for the last fully connected layer, allowing the model to adapt to the new dataset while preserving pre-trained features.

This example shows how to create an embedding layer in PyTorch and use it to convert input indices into dense vector representations.

import torch
import torch.nn as nn

# Create an Embedding layer with 1000 possible tokens and embedding size of 50
embedding = nn.Embedding(num_embeddings=1000, embedding_dim=50)

# Example input tensor containing token indices
input_ids = torch.LongTensor([1, 2, 3])

# Pass input indices through the embedding layer to get dense vectors
embedded = embedding(input_ids)

# Print the shape of the output embedding tensor
print(embedded.shape) # Output shape: (3, 50)
Explanation:
- `nn.Embedding` creates a lookup table mapping integer indices to dense vectors.
- `num_embeddings=1000` means the vocabulary size is 1000 tokens.
- `embedding_dim=50` means each token is represented by a 50-dimensional vector.
- Input `input_ids` is a tensor of indices; after embedding, each index is converted to its vector.
- The output shape `(3, 50)` corresponds to 3 tokens each with a 50-dimensional embedding.

This example demonstrates using the Dropout layer in a TensorFlow Keras model to reduce overfitting by randomly dropping neurons during training.

import tensorflow as tf
from tensorflow.keras.layers import Dense, Dropout

# Build a simple Sequential model with Dropout
model = tf.keras.Sequential([
Dense(64, activation='relu'), # Fully connected layer with 64 units and ReLU activation
Dropout(0.5), # Dropout layer that randomly drops 50% of the neurons during training
Dense(10, activation='softmax') # Output layer with 10 units for classification
])
Explanation:
- The `Dense` layer applies a fully connected layer with ReLU activation.
- `Dropout(0.5)` randomly disables half of the neurons during each training step, which helps prevent overfitting.
- The final `Dense` layer outputs probabilities for 10 classes using the softmax activation.
- Dropout is only active during training and ignored during evaluation or inference.

This example shows how to construct a simple neural network using TensorFlow Keras, incorporating a Dropout layer to reduce overfitting.

import tensorflow as tf
from tensorflow.keras.layers import Dense, Dropout

# Define a Sequential model
model = tf.keras.Sequential([
Dense(64, activation='relu'), # Dense layer with 64 units and ReLU activation
Dropout(0.5), # Dropout layer that drops 50% of inputs during training
Dense(10, activation='softmax') # Output layer with 10 units (for classification)
])
Explanation:
- The Dense layer creates a fully connected layer with ReLU activation.
- The Dropout layer randomly sets 50% of inputs to zero during training to prevent overfitting.
- The last Dense layer outputs class probabilities via softmax activation.
- Dropout is only active during training, ignored during evaluation.

Gradient clipping is used to prevent exploding gradients during training by limiting the maximum norm of gradients.

import torch.nn.utils

# Assuming 'model' is your neural network model
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) # Clips gradients to have max norm 1.0
Explanation:
- `clip_grad_norm_` modifies the gradients of the model's parameters in-place.
- `max_norm=1.0` means the total norm of all gradients will be clipped to at most 1.0.
- This helps stabilize training especially in RNNs or deep networks where gradients can explode.

This example defines a simple convolutional neural network layer block in PyTorch, including convolution, batch normalization, and ReLU activation.

import torch.nn as nn  # Import PyTorch neural network module

class SimpleCNN(nn.Module): # Define a subclass of nn.Module
def __init__(self):
super().__init__() # Initialize the base class
self.conv = nn.Conv2d(3, 16, 3, 1) # 2D conv layer: 3 input channels, 16 output channels, 3x3 kernel, stride 1
self.bn = nn.BatchNorm2d(16) # Batch normalization on 16 channels
self.relu = nn.ReLU() # ReLU activation function

def forward(self, x): # Forward pass method
x = self.conv(x) # Apply convolution
x = self.bn(x) # Apply batch normalization
x = self.relu(x) # Apply ReLU activation
return x # Return the processed output
Explanation:
- The conv layer extracts features from input images.
- Batch normalization normalizes activations to improve training stability.
- ReLU adds non-linearity.
- This block is common in CNN architectures.

This example shows how to implement scaled dot-product attention, a key component in Transformer models.

import torch  # Import PyTorch
import torch.nn.functional as F # Import functional API for activation functions

def scaled_dot_product_attention(query, key, value):
d_k = query.size(-1) # Get the size of the last dimension of query
# Calculate raw attention scores by matrix multiplication of query and key transpose, scaled by sqrt(d_k)
scores = torch.matmul(query, key.transpose(-2, -1)) / torch.sqrt(torch.tensor(d_k, dtype=torch.float32))
weights = F.softmax(scores, dim=-1) # Normalize scores to probabilities with softmax
output = torch.matmul(weights, value) # Multiply weights by values to get weighted output
return output, weights # Return the output and attention weights
Explanation:
- The scaling by sqrt(d_k) prevents large dot product values that could slow down softmax.
- Softmax converts scores into a probability distribution.
- This function returns both the attended output and the attention weights.
- Scaled dot-product attention is fundamental for self-attention in Transformers.

Saving and loading a PyTorch model's learned parameters (state_dict) is essential for model persistence and later inference.

# Save the model's state dictionary to a file
torch.save(model.state_dict(), 'model.pth')

# Load the saved state dictionary into the model
model.load_state_dict(torch.load('model.pth'))

# Set the model to evaluation mode (disables dropout, batchnorm updates)
model.eval()
Explanation:
- `state_dict` contains the model's parameters (weights and biases).
- `torch.save` serializes these parameters to a file (`model.pth`).
- `torch.load` deserializes the saved parameters.
- `load_state_dict` loads the parameters back into the model.
- `model.eval()` switches the model to evaluation mode, important for layers like dropout and batch normalization.

CrossEntropyLoss is commonly used for classification tasks in PyTorch.

import torch.nn as nn

# Define the loss criterion for classification
criterion = nn.CrossEntropyLoss()

# Get model output for inputs (logits, not softmax probabilities)
output = model(inputs)

# Calculate the loss comparing output and target labels
loss = criterion(output, targets)
Explanation:
- `nn.CrossEntropyLoss()` combines `LogSoftmax` and `NLLLoss` in one single class.
- The model outputs raw logits; the criterion applies softmax internally.
- `inputs` are the input features.
- `targets` are the ground truth labels (integers, not one-hot).
- The loss value guides the optimizer during training.

Data augmentation helps increase the diversity of training data by applying random transformations.

from torchvision import transforms

transform = transforms.Compose([
transforms.RandomHorizontalFlip(), # Randomly flip image horizontally
transforms.RandomRotation(15), # Randomly rotate image by ±15 degrees
transforms.ToTensor() # Convert image to tensor
])
Explanation:
- `transforms.Compose()` chains multiple transformations.
- `RandomHorizontalFlip()` randomly flips images horizontally with a default probability of 0.5.
- `RandomRotation(15)` randomly rotates the image within ±15 degrees.
- `ToTensor()` converts the PIL Image or numpy.ndarray to a tensor and normalizes pixel values to [0, 1].
- These augmentations help prevent overfitting by creating variations of images.

Learning rate schedulers adjust the learning rate during training to improve convergence.

import torch.optim as optim

optimizer = optim.Adam(model.parameters(), lr=0.01) # Initialize Adam optimizer with learning rate 0.01
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1) # Reduce LR by factor 0.1 every 10 epochs

for epoch in range(30):
train(...) # Your training function for one epoch
scheduler.step() # Decays learning rate after each epoch
Explanation:
- `optim.Adam` is an adaptive optimizer commonly used in deep learning.
- `StepLR` scheduler decreases the learning rate by multiplying it by `gamma` every `step_size` epochs.
- This helps the model converge better by starting with a larger learning rate and fine-tuning with a smaller one later.
- The `scheduler.step()` should be called after each epoch.

Learning rate decay reduces the learning rate during training to help models converge smoothly.

import torch.optim as optim

optimizer = optim.Adam(model.parameters(), lr=0.01) # Initialize Adam optimizer with initial LR 0.01
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1) # LR decays by 0.1 every 10 epochs

for epoch in range(30):
train(...) # Call your training loop function here
scheduler.step() # Update learning rate based on schedule after each epoch
Explanation:
- `optim.Adam` is a popular optimizer that adapts learning rates per parameter.
- `StepLR` scheduler multiplies the learning rate by `gamma` every `step_size` epochs.
- This strategy helps avoid getting stuck in local minima by gradually reducing LR.
- `scheduler.step()` should be called after each epoch to update the LR.

EarlyStopping is a technique to stop training when the monitored metric stops improving, which helps prevent overfitting.

from tensorflow.keras.callbacks import EarlyStopping

early_stop = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=50, callbacks=[early_stop])
Explanation:
- `monitor='val_loss'` tells the callback to watch validation loss.
- `patience=3` means training will stop if validation loss doesn’t improve for 3 consecutive epochs.
- `restore_best_weights=True` will revert the model to the weights of the epoch with the best validation loss.
- This helps avoid overfitting by stopping training early once performance on validation data stops improving.

EarlyStopping is a useful callback in Keras that monitors a chosen metric during training and stops the training process if the metric stops improving for a specified number of epochs.

from tensorflow.keras.callbacks import EarlyStopping

early_stop = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=50, callbacks=[early_stop])
Explanation:
- `monitor='val_loss'`: Watches the validation loss to detect improvements.
- `patience=3`: Training will stop if the validation loss does not improve after 3 consecutive epochs.
- `restore_best_weights=True`: When training stops, the model weights are reverted to those from the epoch with the best validation loss.
- This technique helps to prevent overfitting by stopping training before the model starts to learn noise in the training data.

This example shows how to build a custom dataset class in PyTorch to load images from a folder, and then use a DataLoader to batch and shuffle the data.

from torch.utils.data import Dataset, DataLoader
from PIL import Image
import os
from torchvision import transforms

class MyDataset(Dataset):
def __init__(self, folder):
self.files = os.listdir(folder) # List all files in the folder
self.folder = folder
def __len__(self):
return len(self.files) # Return number of images
def __getitem__(self, idx):
img_path = os.path.join(self.folder, self.files[idx]) # Get path of image
image = Image.open(img_path).convert("RGB") # Open and convert image to RGB
return transforms.ToTensor()(image) # Convert image to tensor

dataset = MyDataset("images") # Create dataset instance with folder path
loader = DataLoader(dataset, batch_size=32, shuffle=True) # Create DataLoader for batching and shuffling
Explanation:
- The `MyDataset` class inherits from `torch.utils.data.Dataset` and overrides `__len__` and `__getitem__`.
- `__len__` returns the total number of images.
- `__getitem__` loads an image at the given index, converts it to RGB, and transforms it to a tensor.
- `DataLoader` wraps the dataset to enable batching (`batch_size=32`) and random shuffling (`shuffle=True`).
- This pattern is standard for efficient data feeding in PyTorch training loops.

This example demonstrates how to build a TensorFlow dataset pipeline that loads JPEG images from a folder, decodes and resizes them, batches the data, and prefetches for performance.

import tensorflow as tf

def parse_fn(example):
image = tf.image.decode_jpeg(example) # Decode JPEG encoded image bytes
image = tf.image.resize(image, [224, 224]) # Resize image to 224x224
return image

dataset = tf.data.Dataset.list_files("images/*.jpg") # List all jpg files in 'images' folder
dataset = dataset.map(lambda x: parse_fn(tf.io.read_file(x))) # Read file and parse image
dataset = dataset.batch(32) # Batch images in groups of 32
dataset = dataset.prefetch(tf.data.AUTOTUNE) # Prefetch batches for efficient pipeline
Explanation:
- `tf.data.Dataset.list_files()` generates a dataset of file paths matching the pattern.
- `map()` applies the `parse_fn` to decode and resize each image.
- `tf.io.read_file(x)` reads the raw file contents.
- `batch(32)` combines 32 images per batch to feed the model.
- `prefetch(tf.data.AUTOTUNE)` allows data to be prepared while the model is training, improving throughput.
This setup is useful for training image models efficiently in TensorFlow.

This example shows how to convert a TensorFlow Keras model to the TensorFlow Lite (TFLite) format for deployment on edge devices or mobile.

import tensorflow as tf

# Assume you have a trained or loaded Keras model
model = tf.keras.Sequential([...]) # Replace [...] with your model layers
model.compile(...) # Compile the model with optimizer, loss, metrics

# Create a TFLite converter from the Keras model
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert() # Convert the model to TFLite format

# Save the converted TFLite model to a file
with open("model.tflite", "wb") as f:
f.write(tflite_model)
Explanation:
- `tf.lite.TFLiteConverter.from_keras_model(model)` creates a converter object from your Keras model.
- `convert()` performs the actual conversion into a lightweight `.tflite` file.
- Writing the converted model to disk allows you to deploy it on devices that support TensorFlow Lite.
TensorFlow Lite models are optimized for mobile and embedded devices with reduced size and faster inference.

This example demonstrates how to convert a TensorFlow Keras model to TensorFlow Lite, save it, and briefly explains usage.

import tensorflow as tf

# Assume you have a trained or loaded Keras model
model = tf.keras.Sequential([...]) # Replace [...] with your model layers
model.compile(...) # Compile the model with optimizer, loss, and metrics

# Convert the Keras model to TensorFlow Lite format
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert() # Perform conversion

# Save the TensorFlow Lite model to a file
with open("model.tflite", "wb") as f:
f.write(tflite_model)
Explanation:
- This process allows you to deploy models on mobile or embedded devices with TensorFlow Lite runtime.
- The `.tflite` model is a lightweight version optimized for fast inference.
- To use this model for inference, you would load it with a TensorFlow Lite interpreter.

This example shows how to load a pretrained PyTorch ResNet18 model and export it to ONNX format for interoperability with other frameworks.

import torch.onnx

# Load pretrained ResNet18 model from PyTorch Hub
model = torch.hub.load('pytorch/vision', 'resnet18', pretrained=True)

# Create a dummy input tensor matching the model's expected input shape
dummy_input = torch.randn(1, 3, 224, 224)

# Export the model to ONNX format file named "resnet18.onnx"
torch.onnx.export(model, dummy_input, "resnet18.onnx")
Explanation:
- ONNX (Open Neural Network Exchange) is a common format that allows models to be used across different deep learning frameworks.
- The dummy input tensor is used to trace the model’s operations for export.
- Exported ONNX files can be run in runtimes such as ONNX Runtime or converted to other formats.

This example demonstrates how to prune (zero out) 30% of the weights randomly in a PyTorch linear layer to reduce model complexity and potentially improve generalization.

import torch.nn.utils.prune as prune
import torch.nn as nn

# Define a simple linear layer with 100 inputs and 10 outputs
model = nn.Linear(100, 10)

# Apply random unstructured pruning on the 'weight' parameter with 30% sparsity
prune.random_unstructured(model, name="weight", amount=0.3)

# Print the weights to observe pruning (some will be zero now)
print(model.weight)
Explanation:
- Pruning is a technique to remove less important parameters (weights) to compress the model.
- random_unstructured prunes weights randomly across the weight tensor.
- The 'amount=0.3' argument means 30% of weights will be set to zero.
- This can reduce model size and inference cost but may affect accuracy.

This example shows how to convert a pre-trained ResNet18 model from 32-bit floating point precision to a dynamically quantized 8-bit integer model to reduce model size and improve inference speed.

import torch.quantization

# Load a pretrained ResNet18 model (float32)
model_fp32 = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=True)
model_fp32.eval()

# Apply dynamic quantization to Linear layers
model_int8 = torch.quantization.quantize_dynamic(
model_fp32, {torch.nn.Linear}, dtype=torch.qint8
)

# Print the quantized model summary
print(model_int8)
Explanation:
- Dynamic quantization converts certain layers (here Linear layers) to int8 precision.
- This reduces model size and speeds up CPU inference.
- It keeps other layers in float32.
- The model is first set to eval mode.
- This technique is especially useful for NLP and fully connected layers.

This example shows how to use the Hugging Face Transformers library to load a pretrained Vision Transformer (ViT) model and feature extractor, process an image, and perform image classification.

from transformers import ViTFeatureExtractor, ViTForImageClassification
from PIL import Image
import requests
import torch

# Load pretrained ViT model and feature extractor
model = ViTForImageClassification.from_pretrained("google/vit-base-patch16-224")
extractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224")

# Load image from URL
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/image_classification.jpeg"
image = Image.open(requests.get(url, stream=True).raw)

# Preprocess image and convert to PyTorch tensor batch
inputs = extractor(images=image, return_tensors="pt")

# Forward pass through model
outputs = model(**inputs)

# Get predicted class ID with highest logit score
print(outputs.logits.argmax(-1))
Explanation:
- `ViTFeatureExtractor` preprocesses images to the format ViT expects.
- The image is downloaded and opened using PIL.
- The model outputs logits (raw prediction scores) for each class.
- The class with the highest logit is selected as the predicted class.
- This approach is powerful for image classification using transformers.

To use YOLOv5, first clone the official GitHub repository and install its dependencies.

# Clone the YOLOv5 repository
git clone https://github.com/ultralytics/yolov5
cd yolov5
# Install Python dependencies
pip install -r requirements.txt
Explanation:
- `git clone` downloads the YOLOv5 source code.
- `cd yolov5` changes the directory to the cloned repo.
- `pip install -r requirements.txt` installs all necessary Python packages.
This prepares your environment for training or running YOLOv5 object detection models.

This example demonstrates how to load the small YOLOv5 model using PyTorch Hub and run inference on an image URL.

import torch
# Load YOLOv5 small pre-trained model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')

# Image URL for inference
img = 'https://ultralytics.com/images/zidane.jpg'

# Perform inference
results = model(img)

# Show image with detected bounding boxes
results.show()
Explanation:
- `torch.hub.load` loads the pre-trained YOLOv5 small model.
- The image is passed to the model for detection.
- `results.show()` displays the image with detection boxes drawn.
This is a simple way to run object detection using YOLOv5.

This code loads the ResNet50 model pretrained on ImageNet for feature extraction or fine-tuning.

from tensorflow.keras.applications import ResNet50

# Load ResNet50 model with pretrained weights
model = ResNet50(weights='imagenet')
Explanation:
- `ResNet50` is a deep CNN architecture.
- `weights='imagenet'` loads pretrained weights trained on the ImageNet dataset.
- This model can be used for image classification or as a base for transfer learning.

This example shows how to add Batch Normalization between Dense layers to help stabilize and speed up training.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, BatchNormalization

model = Sequential([
Dense(64, input_shape=(100,), activation='relu'), # Dense layer with 64 units and ReLU activation
BatchNormalization(), # Normalize activations to improve training stability
Dense(1, activation='sigmoid') # Output layer for binary classification
])
Explanation:
- BatchNormalization normalizes layer inputs to have mean close to 0 and variance close to 1.
- This helps reduce internal covariate shift and can lead to faster convergence.
- Typically placed after activation or before next layer.

This example defines a simple Generative Adversarial Network (GAN) with a Generator and Discriminator in PyTorch.

import torch
import torch.nn as nn

# Generator network
class Generator(nn.Module):
def __init__(self, input_dim, output_dim):
super().__init__()
self.net = nn.Sequential(
nn.Linear(input_dim, 128), # Fully connected layer from noise input to hidden layer
nn.ReLU(), # Activation function
nn.Linear(128, output_dim), # Output layer to generate data (e.g., image pixels)
nn.Tanh() # Output scaled between -1 and 1
)
def forward(self, x):
return self.net(x)

# Discriminator network
class Discriminator(nn.Module):
def __init__(self, input_dim):
super().__init__()
self.net = nn.Sequential(
nn.Linear(input_dim, 128), # Fully connected layer from input data to hidden layer
nn.LeakyReLU(0.2), # LeakyReLU activation to allow small gradients when inactive
nn.Linear(128, 1), # Output layer giving probability that input is real
nn.Sigmoid() # Sigmoid activation to output probability between 0 and 1
)
def forward(self, x):
return self.net(x)

# Create instances
G = Generator(100, 784) # Generator takes 100-dim noise vector, outputs 784-dim (e.g. 28x28 image)
D = Discriminator(784) # Discriminator takes 784-dim input to classify real/fake
Explanation:
- Generator learns to create realistic data from noise.
- Discriminator learns to distinguish real vs generated data.
- Both are trained adversarially to improve generation quality.

This example shows how to use a pre-trained BERT model for text classification.

from transformers import BertTokenizer, BertForSequenceClassification
from torch.nn.functional import softmax

# Load pre-trained tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') # Tokenizer splits text into tokens
model = BertForSequenceClassification.from_pretrained('bert-base-uncased') # Pre-trained BERT model

# Tokenize input text and convert to tensor format
inputs = tokenizer("I love deep learning!", return_tensors="pt")

# Forward pass through the model
outputs = model(**inputs)
probs = softmax(outputs.logits, dim=1) # Apply softmax to get probabilities for each class

print(probs) # Prints class probabilities
Explanation:
- `BertTokenizer` converts raw text into tokens understandable by BERT.
- `BertForSequenceClassification` is a BERT model with a classification head.
- The input is tokenized and batched (here batch size = 1).
- Model outputs logits (raw scores), which are converted to probabilities via softmax.
- This can be used for sentiment analysis, spam detection, etc.

This code defines a basic Transformer block used in models like BERT and GPT.

import torch
import torch.nn as nn

class TransformerBlock(nn.Module):
def __init__(self, embed_size, heads, ff_hidden, dropout):
super().__init__()
self.attention = nn.MultiheadAttention(embed_size, heads) # Multi-head self-attention
self.norm1 = nn.LayerNorm(embed_size) # Normalization after attention
self.norm2 = nn.LayerNorm(embed_size) # Normalization after feed-forward
self.feed_forward = nn.Sequential(
nn.Linear(embed_size, ff_hidden), # First linear layer in feed-forward
nn.ReLU(), # Activation
nn.Linear(ff_hidden, embed_size) # Second linear layer back to embed size
)
self.dropout = nn.Dropout(dropout) # Dropout for regularization

def forward(self, x):
attn_output, _ = self.attention(x, x, x) # Self-attention on input
x = self.norm1(attn_output + x) # Add & Norm
ff_output = self.feed_forward(x) # Feed-forward network
x = self.norm2(ff_output + x) # Add & Norm
return x
Explanation:
- **MultiheadAttention:** Allows the model to jointly attend to information from different representation subspaces.
- **LayerNorm:** Normalizes the inputs to stabilize and accelerate training.
- **Feed-forward:** Two-layer fully connected network applied independently to each position.
- **Residual connections:** Adding input (`x`) to outputs to help gradient flow.
- This block is a fundamental component of Transformer architectures.

This code builds a simple Convolutional Neural Network (CNN) for image classification with 10 classes.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)), # 32 filters, 3x3 kernel, input shape RGB images 64x64
MaxPooling2D(pool_size=(2, 2)), # Downsamples spatial dims by 2

Conv2D(64, (3, 3), activation='relu'), # 64 filters, 3x3 kernel
MaxPooling2D(pool_size=(2, 2)), # Another downsampling layer

Flatten(), # Flattens 2D feature maps into 1D vector
Dense(128, activation='relu'), # Fully connected layer with 128 units
Dense(10, activation='softmax') # Output layer for 10 classes with probabilities
])
Explanation:
- **Conv2D layers:** Extract spatial features using learnable filters.
- **MaxPooling2D layers:** Reduce spatial dimensions and help in translation invariance.
- **Flatten:** Converts 2D feature maps into a 1D vector for Dense layers.
- **Dense layers:** Perform classification based on extracted features.
- **Softmax activation:** Outputs class probabilities for 10 categories.
This is a common architecture pattern for beginner CNNs on small image datasets.

This is a simple neural network with dropout to reduce overfitting.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(100,))) # Input layer with 128 neurons and ReLU activation
model.add(Dropout(0.5)) # Dropout layer randomly disables 50% of neurons during training to prevent overfitting
model.add(Dense(1, activation='sigmoid')) # Output layer with 1 neuron and sigmoid activation for binary classification
Explanation:
- The **Dense(128)** layer is fully connected and uses ReLU for non-linearity.
- **Dropout(0.5)** randomly turns off half of the neurons during each training step, forcing the model to learn more robust features.
- The last **Dense(1)** layer outputs a probability with sigmoid, suitable for binary classification tasks.
This simple architecture is commonly used in binary classification problems with tabular or vector input.

This code demonstrates transfer learning using the VGG16 model as a fixed feature extractor.

from tensorflow.keras.applications import VGG16
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Flatten

# Load VGG16 without the fully connected layers on top
base_model = VGG16(include_top=False, input_shape=(224, 224, 3))

# Freeze the convolutional base to prevent training
for layer in base_model.layers:
layer.trainable = False

# Add new classifier layers on top
x = Flatten()(base_model.output) # Flatten feature maps to 1D vector
x = Dense(64, activation='relu')(x) # Fully connected layer with ReLU activation
output = Dense(1, activation='sigmoid')(x) # Single output neuron with sigmoid for binary classification

# Create the new model
model = Model(inputs=base_model.input, outputs=output)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
Explanation:
- VGG16 is loaded without its original classification head (`include_top=False`).
- All layers in the base model are frozen to retain pretrained weights.
- A new fully connected head is added to adapt the model for a binary classification task.
- The model is compiled with Adam optimizer and binary cross-entropy loss.
This approach leverages powerful pretrained features and trains only the new classifier layers.

This code defines and uses the Swish activation function as a custom activation in Keras.

from tensorflow.keras import backend as K
from tensorflow.keras.layers import Activation
from tensorflow.keras.utils import get_custom_objects

# Define the Swish activation function
def swish(x):
return x * K.sigmoid(x) # Swish = x * sigmoid(x)

# Register Swish as a custom activation for use in models
get_custom_objects().update({'swish': Activation(swish)})

# Example: Use Swish activation in a simple model
model = Sequential()
model.add(Dense(32, input_shape=(10,), activation='swish')) # Dense layer with Swish activation
Explanation:
- `swish` function implements the Swish activation, which often outperforms ReLU.
- It multiplies input by its sigmoid.
- `get_custom_objects().update` registers it globally so you can use `'swish'` as an activation string.
- The example shows how to add a Dense layer using Swish.
This makes custom activations easy to integrate in Keras workflows.

This example builds and trains a simple feedforward neural network in Keras.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Create a sequential model
model = Sequential()
model.add(Dense(32, input_shape=(10,), activation='relu')) # Input layer with 10 features
model.add(Dense(16, activation='relu')) # Hidden layer with 16 neurons
model.add(Dense(1, activation='sigmoid')) # Output layer for binary classification

# Compile the model with Adam optimizer and binary crossentropy loss
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Generate dummy input data: 100 samples, 10 features each
import numpy as np
X = np.random.rand(100, 10)
# Generate dummy binary labels for classification
y = np.random.randint(2, size=(100, 1))

# Train the model for 5 epochs
model.fit(X, y, epochs=5)
Explanation:
- The model has 3 layers: input, hidden, and output.
- `relu` activations in input and hidden layers introduce non-linearity.
- Output layer uses `sigmoid` for binary classification (output between 0 and 1).
- `binary_crossentropy` loss is suitable for binary targets.
- Dummy data simulates random inputs and labels for quick testing.
- `model.fit()` trains the model on the data.