Deep Learning Interview Questions

Deep learning is a subset of machine learning that uses multi-layered neural networks to automatically learn hierarchical feature representations from data. Unlike traditional ML, deep learning can handle large, complex datasets and automatically extract features without manual engineering.

# Simple deep learning example using Keras

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

model = Sequential()

model.add(Dense(64, activation='relu', input_shape=(100,)))

model.add(Dense(10, activation='softmax'))

model.compile(optimizer='adam', loss='categorical_crossentropy')

A neural network is a computational model inspired by the human brain, consisting of layers of interconnected nodes (neurons). Its main components are the input layer, hidden layers, output layer, weights, biases, and activation functions.

# Define a simple feedforward neural network in PyTorch

import torch

import torch.nn as nn

class SimpleNN(nn.Module):

  def __init__(self):

    super(SimpleNN, self).__init__()

    self.fc1 = nn.Linear(100, 64)

    self.relu = nn.ReLU()

    self.fc2 = nn.Linear(64, 10)

  def forward(self, x):

    x = self.relu(self.fc1(x))

    x = self.fc2(x)

    return x

Backpropagation is an algorithm for training neural networks by computing gradients of the loss function with respect to the weights. It uses the chain rule to propagate errors backward through the network, enabling weight updates to minimize loss.

# Backpropagation example using PyTorch

import torch

import torch.nn as nn

model = nn.Linear(10, 1)

criterion = nn.MSELoss()

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

inputs = torch.randn(5, 10)

targets = torch.randn(5, 1)

outputs = model(inputs)

loss = criterion(outputs, targets)

loss.backward()  # compute gradients

optimizer.step() # update weights

Activation functions introduce non-linearity into neural networks, enabling them to learn complex patterns. Common activation functions include ReLU, Sigmoid, and Tanh.

# Using activation functions in TensorFlow Keras

from tensorflow.keras.layers import Dense, Activation

from tensorflow.keras.models import Sequential

model = Sequential()

model.add(Dense(64, input_shape=(100,)))

model.add(Activation('relu'))

model.add(Dense(1))

model.add(Activation('sigmoid'))

Overfitting occurs when a model learns noise or irrelevant details in training data, resulting in poor generalization to new data. Prevent it using techniques like dropout, regularization, early stopping, and data augmentation.

# Example: Dropout layer in Keras

from tensorflow.keras.layers import Dropout

model = Sequential()

model.add(Dense(128, activation='relu', input_shape=(100,)))

model.add(Dropout(0.5))

model.add(Dense(10, activation='softmax'))

CNNs are specialized neural networks designed for processing grid-like data such as images. They use convolutional layers to automatically learn spatial hierarchies of features.

# Simple CNN example with Keras

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

model = Sequential()

model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28,28,1)))

model.add(MaxPooling2D((2, 2)))

model.add(Flatten())

model.add(Dense(10, activation='softmax'))

RNNs are designed for sequential data like time series or text. They use loops to maintain a memory of previous inputs, solving problems where context matters.

# Simple RNN example in Keras

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import SimpleRNN, Dense

model = Sequential()

model.add(SimpleRNN(50, input_shape=(10, 1)))

model.add(Dense(1))

The vanishing gradient problem occurs during backpropagation when gradients become very small, preventing effective weight updates and slowing or stopping learning, especially in deep or recurrent networks.

# Using ReLU helps mitigate vanishing gradients

from tensorflow.keras.layers import Dense, Activation

model = Sequential()

model.add(Dense(64, input_shape=(100,)))

model.add(Activation('relu'))

model.add(Dense(10, activation='softmax'))

Dropout randomly disables a fraction of neurons during training, preventing over-reliance on any one neuron and reducing overfitting by encouraging more robust feature learning.

# Applying dropout in Keras

from tensorflow.keras.layers import Dropout

model = Sequential()

model.add(Dense(128, activation='relu', input_shape=(100,)))

model.add(Dropout(0.5))

model.add(Dense(10, activation='softmax'))

Batch normalization normalizes layer inputs during training, stabilizing learning, accelerating convergence, and improving overall performance.

# Batch normalization example in Keras

from tensorflow.keras.layers import BatchNormalization, Dense

model = Sequential()

model.add(Dense(64, input_shape=(100,)))

model.add(BatchNormalization())

model.add(Dense(10, activation='softmax'))

Transfer learning uses a pre-trained model on a new but related task, saving training time and improving performance when data is limited.

# Transfer learning example with Keras

from tensorflow.keras.applications import VGG16

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, Flatten

base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224,224,3))

model = Sequential()

model.add(base_model)

model.add(Flatten())

model.add(Dense(10, activation='softmax'))

Supervised deep learning uses labeled data to learn input-output mappings, while unsupervised deep learning finds patterns or representations from unlabeled data.

# Example: Autoencoder for unsupervised learning in Keras

from tensorflow.keras.layers import Input, Dense

from tensorflow.keras.models import Model

input_layer = Input(shape=(100,))

encoded = Dense(32, activation='relu')(input_layer)

decoded = Dense(100, activation='sigmoid')(encoded)

autoencoder = Model(input_layer, decoded)

autoencoder.compile(optimizer='adam', loss='mse')

Autoencoders are neural networks trained to reconstruct input data, used for tasks like dimensionality reduction, anomaly detection, and data denoising.

# Simple autoencoder in Keras

from tensorflow.keras.layers import Input, Dense

from tensorflow.keras.models import Model

input_layer = Input(shape=(784,))

encoded = Dense(64, activation='relu')(input_layer)

decoded = Dense(784, activation='sigmoid')(encoded)

autoencoder = Model(input_layer, decoded)

autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

Gradient descent is an optimization algorithm that updates model weights iteratively to minimize the loss function by moving in the direction of the negative gradient.

# Gradient descent optimization in PyTorch

import torch

import torch.nn as nn

model = nn.Linear(10, 1)

criterion = nn.MSELoss()

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

inputs = torch.randn(5, 10)

targets = torch.randn(5, 1)

outputs = model(inputs)

loss = criterion(outputs, targets)

loss.backward()

optimizer.step()

Loss functions measure how well a model's predictions match the target values. They guide the training process by providing feedback for optimization.

# Example of using loss functions in Keras

from tensorflow.keras.losses import SparseCategoricalCrossentropy

model.compile(optimizer='adam', loss=SparseCategoricalCrossentropy())

Early stopping monitors validation loss during training and stops training when loss stops improving, preventing the model from overfitting.

# Early stopping callback in Keras

from tensorflow.keras.callbacks import EarlyStopping

early_stop = EarlyStopping(monitor='val_loss', patience=3)

model.fit(x_train, y_train, validation_data=(x_val, y_val), epochs=100, callbacks=[early_stop])

Optimizers update the network weights during training by using calculated gradients to minimize the loss function, affecting convergence speed and accuracy.

# Using Adam optimizer in Keras

model.compile(optimizer='adam', loss='categorical_crossentropy')

Batch gradient descent uses the entire training dataset to compute gradients, while stochastic gradient descent uses one sample at a time, which can speed up training but adds noise.

# Example: SGD optimizer in PyTorch

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

The loss landscape is a visualization of the loss function values over different model parameter configurations, showing valleys (good minima) and peaks (bad solutions).

# Visualization typically done with tools like matplotlib

# (no simple code snippet for this)

GANs consist of two networks — a generator that creates fake data and a discriminator that tries to distinguish fake from real data. They compete in a game to improve data generation.

# GAN skeleton code in PyTorch (simplified)

import torch

import torch.nn as nn

class Generator(nn.Module):

  def __init__(self):

    super().__init__()

    self.fc = nn.Linear(100, 784)

  def forward(self, x):

    return torch.sigmoid(self.fc(x))

class Discriminator(nn.Module):

  def __init__(self):

    super().__init__()

    self.fc = nn.Linear(784, 1)

  def forward(self, x):

    return torch.sigmoid(self.fc(x))

Transfer learning uses a pretrained model on a new task by reusing learned features, speeding up training and improving performance especially on small datasets.

# Example: Using pretrained ResNet in PyTorch

import torchvision.models as models

resnet = models.resnet18(pretrained=True)

# Freeze layers

for param in resnet.parameters():

    param.requires_grad = False

# Replace last layer for new task

import torch.nn as nn

resnet.fc = nn.Linear(resnet.fc.in_features, 10)  # 10 classes

Dropout randomly disables neurons during training to prevent co-adaptation and reduce overfitting, improving model generalization.

# Dropout example in Keras

from tensorflow.keras.layers import Dropout

model.add(Dense(128, activation='relu'))

model.add(Dropout(0.5))  # 50% dropout rate

Batch normalization normalizes inputs of each layer to stabilize and accelerate training by reducing internal covariate shift.

# Batch normalization in PyTorch

import torch.nn as nn

bn = nn.BatchNorm1d(num_features=128)

# Apply after linear layer

The vanishing gradient problem occurs when gradients become very small during backpropagation, slowing learning in early layers of deep networks.

# Using ReLU activation helps mitigate vanishing gradients

import torch.nn.functional as F

x = F.relu(x)

CNNs use convolutional layers to detect spatial features by sliding filters over input data, capturing local patterns and reducing parameters.

# CNN layer example in Keras

from tensorflow.keras.layers import Conv2D

model.add(Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)))

RNNs process sequential data by maintaining a hidden state that captures information from previous inputs, useful for time series and language.

# Simple RNN in Keras

from tensorflow.keras.layers import SimpleRNN

model.add(SimpleRNN(50, input_shape=(timesteps, features)))

LSTMs are a type of RNN designed to learn long-term dependencies using gates to control information flow and prevent vanishing gradients.

# LSTM layer in Keras

from tensorflow.keras.layers import LSTM

model.add(LSTM(100, input_shape=(timesteps, features)))

Transformers use self-attention mechanisms to weigh input parts differently, enabling parallel processing of sequences and superior performance on NLP tasks.

# Using HuggingFace Transformer model

from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

model = AutoModel.from_pretrained('bert-base-uncased')

Self-attention computes the importance of each word relative to others in a sequence, allowing the model to capture context efficiently.

# Simplified self-attention computation

import torch

query = torch.rand(1,5,64)  # batch, seq_len, dim

key = torch.rand(1,5,64)

value = torch.rand(1,5,64)

scores = torch.matmul(query, key.transpose(-2,-1)) / (64**0.5)

weights = torch.nn.functional.softmax(scores, dim=-1)

output = torch.matmul(weights, value)

Attention allows models to focus on relevant parts of input sequences dynamically, improving performance especially in NLP and vision tasks.

# Attention example pseudo-code

context_vector = sum(attention_weights * encoder_outputs)

Fine-tuning updates pretrained model weights on a new dataset by training some or all layers to better adapt to the specific task.

# Fine-tuning last layers in PyTorch

for param in model.parameters():

    param.requires_grad = False

for param in model.fc.parameters():

    param.requires_grad = True

# Train only last fc layer

Data augmentation artificially increases training data by applying transformations like rotation or flipping, improving model robustness and preventing overfitting.

# Data augmentation example in Keras

from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(rotation_range=20, horizontal_flip=True)

Gradient clipping limits gradient values during backpropagation to prevent exploding gradients, stabilizing training.

# Gradient clipping example in PyTorch

torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

Early stopping halts training once validation loss stops improving for a set patience, avoiding overfitting.

# EarlyStopping in Keras

from tensorflow.keras.callbacks import EarlyStopping

early_stopping = EarlyStopping(monitor='val_loss', patience=5)

model.fit(X_train, y_train, validation_data=(X_val, y_val), callbacks=[early_stopping])

An epoch is one complete pass through the entire training dataset; multiple epochs are used to iteratively improve the model.

# Training with epochs in Keras

model.fit(X_train, y_train, epochs=10)

Overfitting happens when a model learns noise instead of patterns, performing well on training but poorly on unseen data. Prevent with dropout, regularization, and early stopping.

# Example of L2 regularization in Keras

from tensorflow.keras.regularizers import l2

model.add(Dense(64, kernel_regularizer=l2(0.01)))

The learning rate controls the step size during weight updates; too high causes divergence, too low slows learning.

# Setting learning rate in PyTorch

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

Regularization adds constraints or penalties to reduce model complexity and prevent overfitting, like L1/L2 penalties and dropout.

# L1 regularization example in Keras

from tensorflow.keras.regularizers import l1

model.add(Dense(64, kernel_regularizer=l1(0.01)))

Batch size is the number of samples processed before the model updates weights; mini-batch refers to smaller subsets of the dataset used in batch gradient descent.

# Example: batch_size=32 in Keras

model.fit(X_train, y_train, batch_size=32)

The optimizer updates model weights based on computed gradients to minimize loss; popular optimizers include SGD, Adam, and RMSprop.

# Using Adam optimizer in TensorFlow

model.compile(optimizer='adam', loss='categorical_crossentropy')

A loss function measures how well the model's predictions match the true data, guiding optimization to reduce errors.

# Example loss functions

# For classification: categorical_crossentropy

# For regression: mean_squared_error

Supervised learning trains on labeled data with known outputs; unsupervised learning finds patterns in unlabeled data.

# Supervised example: classification

# Unsupervised example: clustering

Reinforcement learning trains agents to make sequences of decisions by rewarding desired actions and punishing undesired ones.

# Simplified RL loop pseudocode

state = env.reset()

while not done:

  action = agent.act(state)

  next_state, reward, done = env.step(action)

  agent.learn(state, action, reward, next_state)

  state = next_state

AI is the broad field of making machines intelligent; machine learning is a subset where models learn from data; deep learning is a subset of ML using deep neural networks.

# Deep learning example: neural networks

# ML example: decision trees

# AI includes rule-based systems too

The curse of dimensionality refers to problems caused by high-dimensional data, where data becomes sparse and learning harder.

# Dimensionality reduction methods:

# PCA, t-SNE, UMAP

Autoencoders are neural networks trained to reconstruct input data, learning compressed representations useful for denoising or dimensionality reduction.

# Autoencoder example in Keras

from tensorflow.keras.layers import Input, Dense

from tensorflow.keras.models import Model

input_img = Input(shape=(784,))

encoded = Dense(64, activation='relu')(input_img)

decoded = Dense(784, activation='sigmoid')(encoded)

autoencoder = Model(input_img, decoded)

GANs consist of a generator and discriminator network competing to generate realistic data; used for image synthesis, style transfer, etc.

# GAN simplified flow pseudocode

# Generator creates fake samples

# Discriminator tries to distinguish real vs fake

A policy defines the agent’s behavior, mapping states to actions, which can be deterministic or stochastic.

# Policy example pseudocode

def policy(state):

  return action

Backpropagation computes gradients of loss with respect to model weights using chain rule, enabling gradient descent optimization.

# Pseudocode for backpropagation

loss.backward()

optimizer.step()

A confusion matrix shows true vs predicted classification results, helping evaluate performance like accuracy, precision, recall.

# Confusion matrix example using sklearn

from sklearn.metrics import confusion_matrix

y_true = [0, 1, 0, 1]

y_pred = [0, 0, 0, 1]

cm = confusion_matrix(y_true, y_pred)

print(cm)

Overfitting happens when a model learns training data too well, including noise, causing poor generalization to new data.

# Example: High train accuracy but low test accuracy

Use techniques like regularization (L1/L2), dropout, early stopping, data augmentation, and simpler models.

# Example of dropout in Keras

from tensorflow.keras.layers import Dropout

model.add(Dropout(0.5))

Underfitting occurs when a model is too simple to capture underlying data patterns, leading to poor performance on training and test data.

# Example: low train and test accuracy

Regularization adds penalties to loss function to discourage complex models and reduce overfitting.

# L2 regularization example in scikit-learn

from sklearn.linear_model import Ridge

model = Ridge(alpha=1.0)

Cross-validation splits data into subsets to train and validate the model multiple times for reliable performance estimates.

# K-Fold CV in scikit-learn

from sklearn.model_selection import KFold

kf = KFold(n_splits=5)

Bias is error from oversimplifying; variance is error from sensitivity to training data. Tradeoff balances underfitting and overfitting.

# Model complexity affects bias and variance

Activation functions add non-linearity allowing neural networks to learn complex patterns; examples: ReLU, Sigmoid, Tanh.

# Example: ReLU in TensorFlow

from tensorflow.keras.layers import ReLU

layer = ReLU()

Vanishing gradients happen when gradients become too small during backpropagation, slowing learning in deep networks.

# ReLU helps mitigate vanishing gradients compared to Sigmoid

Dropout randomly disables neurons during training to prevent co-adaptation and reduce overfitting.

# Dropout example in Keras

from tensorflow.keras.layers import Dropout

model.add(Dropout(0.5))

Batch normalization normalizes layer inputs to stabilize learning and improve training speed.

# BatchNorm example in TensorFlow

from tensorflow.keras.layers import BatchNormalization

model.add(BatchNormalization())

CNNs are specialized neural networks for processing grid-like data such as images, using convolutional layers to detect features.

# Simple CNN layer example in Keras

from tensorflow.keras.layers import Conv2D

model.add(Conv2D(32, (3,3), activation='relu'))

Pooling reduces spatial dimensions of feature maps, helping reduce computation and control overfitting.

# MaxPooling example in Keras

from tensorflow.keras.layers import MaxPooling2D

model.add(MaxPooling2D(pool_size=(2, 2)))

RNNs process sequential data by maintaining a hidden state to capture temporal dependencies.

# Simple RNN layer example in Keras

from tensorflow.keras.layers import SimpleRNN

model.add(SimpleRNN(50))

LSTMs are a type of RNN designed to overcome the vanishing gradient problem with gates controlling information flow.

# LSTM example in Keras

from tensorflow.keras.layers import LSTM

model.add(LSTM(50))

Transformers use self-attention mechanisms to process sequential data efficiently, enabling parallelization and better long-range context.

# Transformer attention pseudocode

# Attention(Q, K, V) = softmax(QK^T / sqrt(d_k)) V

Attention weights parts of input differently, allowing models to focus on important information dynamically.

# Scaled dot-product attention example

Transfer learning reuses a pre-trained model on a new task, saving training time and improving performance with limited data.

# Using pre-trained model in Keras

from tensorflow.keras.applications import VGG16

base_model = VGG16(weights='imagenet', include_top=False)

Fine-tuning adjusts some layers of a pre-trained model with new data to better fit the specific task.

# Freeze base layers and train top layers example

for layer in base_model.layers:

  layer.trainable = False

Data augmentation artificially increases training data by applying transformations like rotations, flips, and shifts.

# Example in Keras ImageDataGenerator

from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(rotation_range=20, horizontal_flip=True)

Classification predicts discrete labels; regression predicts continuous numeric values.

# Classification example: spam detection

# Regression example: house price prediction

Precision is correct positive predictions over all positive predictions; recall is correct positive over all actual positives; F1 is their harmonic mean.

# Example in scikit-learn

from sklearn.metrics import precision_score, recall_score, f1_score

ROC curve plots true positive rate vs false positive rate; AUC measures overall model performance.

# Example in scikit-learn

from sklearn.metrics import roc_curve, auc

Gradient descent is an optimization algorithm minimizing loss by iteratively updating parameters in the negative gradient direction.

# Simple gradient descent example in Python

learning_rate = 0.01

weights -= learning_rate * gradient

SGD updates parameters using one or few samples per iteration, allowing faster but noisier convergence.

# SGD example with mini-batches

Optimizers improve model training by adjusting parameters efficiently; examples: SGD, Adam, RMSprop.

# Using Adam optimizer in Keras

from tensorflow.keras.optimizers import Adam

model.compile(optimizer=Adam(), loss='categorical_crossentropy')

Early stopping halts training when validation loss stops improving, preventing overfitting.

# EarlyStopping callback in Keras

from tensorflow.keras.callbacks import EarlyStopping

early_stop = EarlyStopping(monitor='val_loss', patience=3)

Confusion matrix summarizes classification results with counts of TP, TN, FP, and FN.

# Confusion matrix in scikit-learn

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_true, y_pred)

Reinforcement learning trains agents to make sequential decisions by maximizing cumulative rewards.

# Simple Q-learning pseudocode

Q(s, a) = Q(s, a) + α [r + γ max_a' Q(s', a') - Q(s, a)]

Policy-based methods learn a policy directly; value-based methods estimate value functions to derive policies.

# Example: REINFORCE (policy-based) vs Q-learning (value-based)

NLP enables machines to understand, interpret, and generate human language.

# Example: tokenization in Python using NLTK

import nltk

tokens = nltk.word_tokenize("Hello world!")

Word embeddings represent words as dense vectors capturing semantic relationships.

# Example: Word2Vec using Gensim

from gensim.models import Word2Vec

model = Word2Vec(sentences, vector_size=100, window=5)

Seq2seq models map input sequences to output sequences, commonly used in translation and chatbots.

# Example architecture: Encoder-Decoder

Tokenization splits text into words or subwords for processing.

# Example using spaCy

import spacy

nlp = spacy.load("en_core_web_sm")

doc = nlp("Hello world!")

tokens = [token.text for token in doc]

NER detects and classifies named entities like people, organizations, and locations in text.

# Example using spaCy

for ent in doc.ents:

  print(ent.text, ent.label_)

Sentiment analysis classifies text into positive, negative, or neutral opinions.

# Example using TextBlob

from textblob import TextBlob

analysis = TextBlob("I love this!")

print(analysis.sentiment.polarity)

Topic modeling discovers abstract topics in documents; popular method: Latent Dirichlet Allocation (LDA).

# Example using Gensim LDA

from gensim.models.ldamodel import LdaModel

Word sense disambiguation identifies the correct meaning of a word based on context.

# Example approach: Lesk algorithm

Sequence labeling assigns categorical labels to each token in a sequence, e.g., POS tagging or NER.

# Example: POS tagging

Language models predict the next word or token in a sequence based on context.

# Example: GPT predicts next token

TF-IDF scores words by importance, balancing term frequency with how rare they are across documents.

# TF-IDF in scikit-learn

from sklearn.feature_extraction.text import TfidfVectorizer

Supervised learning uses labeled data; unsupervised learning finds patterns in unlabeled data.

# Examples: classification (supervised), clustering (unsupervised)

Clustering groups similar data points without labels, e.g., K-means clustering.

# Example: KMeans clustering

from sklearn.cluster import KMeans

Dimensionality reduction reduces features while preserving structure, e.g., PCA.

# PCA example

from sklearn.decomposition import PCA

Feature engineering creates meaningful features from raw data to improve model performance.

# Example: creating new features from date columns

Overfitting occurs when a model learns noise, reducing generalization; prevent by regularization, dropout.

# Example: dropout in Keras

from tensorflow.keras.layers import Dropout

Regularization adds penalty to loss to reduce complexity; common types: L1, L2.

# L2 regularization in Keras

from tensorflow.keras.regularizers import l2

Batch normalization normalizes layer inputs to speed training and improve stability.

# BatchNormalization layer in Keras

from tensorflow.keras.layers import BatchNormalization

Dropout randomly disables neurons during training to prevent overfitting.

# Dropout layer example

from tensorflow.keras.layers import Dropout

Cross-validation splits data into training and testing sets multiple times to assess model generalization.

# KFold cross-validation

from sklearn.model_selection import KFold

Hyperparameter tuning finds the best model parameters, often using grid or random search.

# Example with GridSearchCV

from sklearn.model_selection import GridSearchCV

This example builds and trains a simple neural network for digit classification using TensorFlow and the MNIST dataset.

import tensorflow as tf

from tensorflow.keras.datasets import mnist

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, Flatten

from tensorflow.keras.utils import to_categorical


(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train / 255.0

x_test = x_test / 255.0

y_train = to_categorical(y_train)

y_test = to_categorical(y_test)


model = Sequential([

    Flatten(input_shape=(28, 28)),

    Dense(128, activation='relu'),

    Dense(10, activation='softmax')
])


model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

This example builds and trains a simple neural network for digit classification using TensorFlow and the MNIST dataset.

import tensorflow as tf

from tensorflow.keras.datasets import mnist

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, Flatten

from tensorflow.keras.utils import to_categorical


(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train / 255.0

x_test = x_test / 255.0

y_train = to_categorical(y_train)

y_test = to_categorical(y_test)


model = Sequential([

    Flatten(input_shape=(28, 28)),

    Dense(128, activation='relu'),

    Dense(10, activation='softmax')
])


model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

Dropout is a regularization technique to prevent overfitting in neural networks by randomly setting some units to zero during training.

import tensorflow as tf

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, Dropout, Flatten

from tensorflow.keras.datasets import mnist

from tensorflow.keras.utils import to_categorical


(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train / 255.0

x_test = x_test / 255.0

y_train = to_categorical(y_train)

y_test = to_categorical(y_test)


model = Sequential([

    Flatten(input_shape=(28, 28)),

    Dense(128, activation='relu'),

    Dropout(0.2),

    Dense(10, activation='softmax')
])


model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

Convolutional layers are the core of CNNs, ideal for image data. This example uses convolutional and pooling layers to build a CNN for MNIST digit classification.

import tensorflow as tf

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

from tensorflow.keras.datasets import mnist

from tensorflow.keras.utils import to_categorical


(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(-1, 28, 28, 1) / 255.0

x_test = x_test.reshape(-1, 28, 28, 1) / 255.0

y_train = to_categorical(y_train)

y_test = to_categorical(y_test)


model = Sequential([

    Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)),

    MaxPooling2D(pool_size=(2, 2)),

    Flatten(),

    Dense(128, activation='relu'),

    Dense(10, activation='softmax')
])


model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

This example demonstrates a basic training loop in PyTorch including training and evaluation steps. Each line is explained using inline comments.

for epoch in range(num_epochs):  # Loop over the dataset multiple times (epochs)

    model.train()  # Set the model to training mode

    for inputs, labels in train_loader:  # Iterate over batches from the training set

        optimizer.zero_grad()  # Clear gradients from the previous step

        outputs = model(inputs)  # Perform forward pass to get predictions

        loss = criterion(outputs, labels)  # Compute the loss between predictions and true labels

        loss.backward()  # Backpropagate the loss to compute gradients

        optimizer.step()  # Update model weights using the optimizer


    model.eval()  # Switch to evaluation mode (e.g., disables dropout)

    with torch.no_grad():  # Disable gradient computation for evaluation

        # Evaluate on validation set

        # Typically: loop through val_loader, run model(inputs), compute accuracy/loss

        pass  # Placeholder where validation code would go

This example builds a basic autoencoder with one hidden layer using Keras. Autoencoders are useful for unsupervised learning tasks like dimensionality reduction or denoising.

from tensorflow.keras import layers, models  # Import necessary Keras components



input_img = layers.Input(shape=(784,))  # Input layer for flattened 28x28 image (MNIST)

encoded = layers.Dense(64, activation='relu')(input_img)  # Encoding layer with 64 units and ReLU activation

decoded = layers.Dense(784, activation='sigmoid')(encoded)  # Decoding layer with sigmoid activation to output 784 values



autoencoder = models.Model(input_img, decoded)  # Define the autoencoder model from input to reconstructed output

autoencoder.compile(optimizer='adam', loss='binary_crossentropy')  # Compile model with Adam optimizer and binary crossentropy loss

This example demonstrates building a simple Convolutional Neural Network (CNN) in Keras for classifying grayscale images like MNIST digits.

from tensorflow.keras import layers, models  # Import the layers and models modules from Keras



model = models.Sequential([  # Define a sequential model

    layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),  # 32 filters, 3x3 kernel, ReLU activation, grayscale input

    layers.MaxPooling2D((2,2)),  # Downsample with a 2x2 max pooling layer

    layers.Flatten(),  # Flatten the 2D feature maps to 1D

    layers.Dense(64, activation='relu'),  # Fully connected layer with 64 neurons and ReLU

    layers.Dense(10, activation='softmax')  # Output layer for 10 classes with softmax activation

])



model.compile(  # Compile the model with appropriate configurations

    optimizer='adam',  # Use Adam optimizer

    loss='sparse_categorical_crossentropy',  # Suitable for integer-labeled classification

    metrics=['accuracy']  # Track accuracy during training

)

This code snippet demonstrates how to freeze all layers of a PyTorch model except the final fully connected (FC) layer. It's commonly used in transfer learning when fine-tuning pre-trained models like ResNet.

# Freeze all layers in the model
for param in model.parameters():

    param.requires_grad = False  # Disables gradient computation for each parameter



# Unfreeze only the final fully connected layer
for param in model.fc.parameters():

    param.requires_grad = True  # Enables training only for the FC layer

Explanation:
- The first loop disables gradient updates for **all** model parameters (useful when using pre-trained models).
- The second loop **re-enables** training (backpropagation) only for the last fully connected layer, allowing the model to adapt to the new dataset while preserving pre-trained features.

This example shows how to create an embedding layer in PyTorch and use it to convert input indices into dense vector representations.

import torch

import torch.nn as nn


# Create an Embedding layer with 1000 possible tokens and embedding size of 50

embedding = nn.Embedding(num_embeddings=1000, embedding_dim=50)


# Example input tensor containing token indices

input_ids = torch.LongTensor([1, 2, 3])


# Pass input indices through the embedding layer to get dense vectors

embedded = embedding(input_ids)


# Print the shape of the output embedding tensor

print(embedded.shape)  # Output shape: (3, 50)

Explanation:
- `nn.Embedding` creates a lookup table mapping integer indices to dense vectors.
- `num_embeddings=1000` means the vocabulary size is 1000 tokens.
- `embedding_dim=50` means each token is represented by a 50-dimensional vector.
- Input `input_ids` is a tensor of indices; after embedding, each index is converted to its vector.
- The output shape `(3, 50)` corresponds to 3 tokens each with a 50-dimensional embedding.

This example demonstrates using the Dropout layer in a TensorFlow Keras model to reduce overfitting by randomly dropping neurons during training.

import tensorflow as tf

from tensorflow.keras.layers import Dense, Dropout


# Build a simple Sequential model with Dropout

model = tf.keras.Sequential([

    Dense(64, activation='relu'),  # Fully connected layer with 64 units and ReLU activation

    Dropout(0.5),  # Dropout layer that randomly drops 50% of the neurons during training

    Dense(10, activation='softmax')  # Output layer with 10 units for classification

])

Explanation:
- The `Dense` layer applies a fully connected layer with ReLU activation.
- `Dropout(0.5)` randomly disables half of the neurons during each training step, which helps prevent overfitting.
- The final `Dense` layer outputs probabilities for 10 classes using the softmax activation.
- Dropout is only active during training and ignored during evaluation or inference.

This example shows how to construct a simple neural network using TensorFlow Keras, incorporating a Dropout layer to reduce overfitting.

import tensorflow as tf

from tensorflow.keras.layers import Dense, Dropout


# Define a Sequential model

model = tf.keras.Sequential([

    Dense(64, activation='relu'),  # Dense layer with 64 units and ReLU activation

    Dropout(0.5),                  # Dropout layer that drops 50% of inputs during training

    Dense(10, activation='softmax')  # Output layer with 10 units (for classification)

])

Explanation:
- The Dense layer creates a fully connected layer with ReLU activation.
- The Dropout layer randomly sets 50% of inputs to zero during training to prevent overfitting.
- The last Dense layer outputs class probabilities via softmax activation.
- Dropout is only active during training, ignored during evaluation.

Gradient clipping is used to prevent exploding gradients during training by limiting the maximum norm of gradients.

import torch.nn.utils


# Assuming 'model' is your neural network model

torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)  # Clips gradients to have max norm 1.0

Explanation:
- `clip_grad_norm_` modifies the gradients of the model's parameters in-place.
- `max_norm=1.0` means the total norm of all gradients will be clipped to at most 1.0.
- This helps stabilize training especially in RNNs or deep networks where gradients can explode.

This example defines a simple convolutional neural network layer block in PyTorch, including convolution, batch normalization, and ReLU activation.

import torch.nn as nn  # Import PyTorch neural network module


class SimpleCNN(nn.Module):  # Define a subclass of nn.Module

    def __init__(self):

        super().__init__()  # Initialize the base class

        self.conv = nn.Conv2d(3, 16, 3, 1)  # 2D conv layer: 3 input channels, 16 output channels, 3x3 kernel, stride 1

        self.bn = nn.BatchNorm2d(16)  # Batch normalization on 16 channels

        self.relu = nn.ReLU()  # ReLU activation function


    def forward(self, x):  # Forward pass method

        x = self.conv(x)  # Apply convolution

        x = self.bn(x)  # Apply batch normalization

        x = self.relu(x)  # Apply ReLU activation

        return x  # Return the processed output

Explanation:
- The conv layer extracts features from input images.
- Batch normalization normalizes activations to improve training stability.
- ReLU adds non-linearity.
- This block is common in CNN architectures.

This example shows how to implement scaled dot-product attention, a key component in Transformer models.

import torch  # Import PyTorch

import torch.nn.functional as F  # Import functional API for activation functions


def scaled_dot_product_attention(query, key, value):

    d_k = query.size(-1)  # Get the size of the last dimension of query

    # Calculate raw attention scores by matrix multiplication of query and key transpose, scaled by sqrt(d_k)

    scores = torch.matmul(query, key.transpose(-2, -1)) / torch.sqrt(torch.tensor(d_k, dtype=torch.float32))

    weights = F.softmax(scores, dim=-1)  # Normalize scores to probabilities with softmax

    output = torch.matmul(weights, value)  # Multiply weights by values to get weighted output

    return output, weights  # Return the output and attention weights

Explanation:
- The scaling by sqrt(d_k) prevents large dot product values that could slow down softmax.
- Softmax converts scores into a probability distribution.
- This function returns both the attended output and the attention weights.
- Scaled dot-product attention is fundamental for self-attention in Transformers.

Saving and loading a PyTorch model's learned parameters (state_dict) is essential for model persistence and later inference.

# Save the model's state dictionary to a file

torch.save(model.state_dict(), 'model.pth')


# Load the saved state dictionary into the model

model.load_state_dict(torch.load('model.pth'))


# Set the model to evaluation mode (disables dropout, batchnorm updates)

model.eval()

Explanation:
- `state_dict` contains the model's parameters (weights and biases).
- `torch.save` serializes these parameters to a file (`model.pth`).
- `torch.load` deserializes the saved parameters.
- `load_state_dict` loads the parameters back into the model.
- `model.eval()` switches the model to evaluation mode, important for layers like dropout and batch normalization.

CrossEntropyLoss is commonly used for classification tasks in PyTorch.

import torch.nn as nn


# Define the loss criterion for classification

criterion = nn.CrossEntropyLoss()


# Get model output for inputs (logits, not softmax probabilities)

output = model(inputs)


# Calculate the loss comparing output and target labels

loss = criterion(output, targets)

Explanation:
- `nn.CrossEntropyLoss()` combines `LogSoftmax` and `NLLLoss` in one single class.
- The model outputs raw logits; the criterion applies softmax internally.
- `inputs` are the input features.
- `targets` are the ground truth labels (integers, not one-hot).
- The loss value guides the optimizer during training.

Data augmentation helps increase the diversity of training data by applying random transformations.

from torchvision import transforms


transform = transforms.Compose([

    transforms.RandomHorizontalFlip(),  # Randomly flip image horizontally

    transforms.RandomRotation(15),       # Randomly rotate image by ±15 degrees

    transforms.ToTensor()                 # Convert image to tensor

])

Explanation:
- `transforms.Compose()` chains multiple transformations.
- `RandomHorizontalFlip()` randomly flips images horizontally with a default probability of 0.5.
- `RandomRotation(15)` randomly rotates the image within ±15 degrees.
- `ToTensor()` converts the PIL Image or numpy.ndarray to a tensor and normalizes pixel values to [0, 1].
- These augmentations help prevent overfitting by creating variations of images.

Learning rate schedulers adjust the learning rate during training to improve convergence.

import torch.optim as optim


optimizer = optim.Adam(model.parameters(), lr=0.01)  # Initialize Adam optimizer with learning rate 0.01

scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)  # Reduce LR by factor 0.1 every 10 epochs


for epoch in range(30):

    train(...)  # Your training function for one epoch

    scheduler.step()  # Decays learning rate after each epoch

Explanation:
- `optim.Adam` is an adaptive optimizer commonly used in deep learning.
- `StepLR` scheduler decreases the learning rate by multiplying it by `gamma` every `step_size` epochs.
- This helps the model converge better by starting with a larger learning rate and fine-tuning with a smaller one later.
- The `scheduler.step()` should be called after each epoch.

Learning rate decay reduces the learning rate during training to help models converge smoothly.

import torch.optim as optim


optimizer = optim.Adam(model.parameters(), lr=0.01)  # Initialize Adam optimizer with initial LR 0.01

scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)  # LR decays by 0.1 every 10 epochs


for epoch in range(30):

    train(...)  # Call your training loop function here

    scheduler.step()  # Update learning rate based on schedule after each epoch

Explanation:
- `optim.Adam` is a popular optimizer that adapts learning rates per parameter.
- `StepLR` scheduler multiplies the learning rate by `gamma` every `step_size` epochs.
- This strategy helps avoid getting stuck in local minima by gradually reducing LR.
- `scheduler.step()` should be called after each epoch to update the LR.

EarlyStopping is a technique to stop training when the monitored metric stops improving, which helps prevent overfitting.

from tensorflow.keras.callbacks import EarlyStopping


early_stop = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)


model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=50, callbacks=[early_stop])

Explanation:
- `monitor='val_loss'` tells the callback to watch validation loss.
- `patience=3` means training will stop if validation loss doesn’t improve for 3 consecutive epochs.
- `restore_best_weights=True` will revert the model to the weights of the epoch with the best validation loss.
- This helps avoid overfitting by stopping training early once performance on validation data stops improving.

EarlyStopping is a useful callback in Keras that monitors a chosen metric during training and stops the training process if the metric stops improving for a specified number of epochs.

from tensorflow.keras.callbacks import EarlyStopping


early_stop = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)


model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=50, callbacks=[early_stop])

Explanation:
- `monitor='val_loss'`: Watches the validation loss to detect improvements.
- `patience=3`: Training will stop if the validation loss does not improve after 3 consecutive epochs.
- `restore_best_weights=True`: When training stops, the model weights are reverted to those from the epoch with the best validation loss.
- This technique helps to prevent overfitting by stopping training before the model starts to learn noise in the training data.

This example shows how to build a custom dataset class in PyTorch to load images from a folder, and then use a DataLoader to batch and shuffle the data.

from torch.utils.data import Dataset, DataLoader

from PIL import Image

import os

from torchvision import transforms


class MyDataset(Dataset):

    def __init__(self, folder):

        self.files = os.listdir(folder)  # List all files in the folder

        self.folder = folder

    def __len__(self):

        return len(self.files)  # Return number of images

    def __getitem__(self, idx):

        img_path = os.path.join(self.folder, self.files[idx])  # Get path of image

        image = Image.open(img_path).convert("RGB")  # Open and convert image to RGB

        return transforms.ToTensor()(image)  # Convert image to tensor


dataset = MyDataset("images")  # Create dataset instance with folder path

loader = DataLoader(dataset, batch_size=32, shuffle=True)  # Create DataLoader for batching and shuffling

Explanation:
- The `MyDataset` class inherits from `torch.utils.data.Dataset` and overrides `__len__` and `__getitem__`.
- `__len__` returns the total number of images.
- `__getitem__` loads an image at the given index, converts it to RGB, and transforms it to a tensor.
- `DataLoader` wraps the dataset to enable batching (`batch_size=32`) and random shuffling (`shuffle=True`).
- This pattern is standard for efficient data feeding in PyTorch training loops.

This example demonstrates how to build a TensorFlow dataset pipeline that loads JPEG images from a folder, decodes and resizes them, batches the data, and prefetches for performance.

import tensorflow as tf


def parse_fn(example):

    image = tf.image.decode_jpeg(example)  # Decode JPEG encoded image bytes

    image = tf.image.resize(image, [224, 224])  # Resize image to 224x224

    return image


dataset = tf.data.Dataset.list_files("images/*.jpg")  # List all jpg files in 'images' folder

dataset = dataset.map(lambda x: parse_fn(tf.io.read_file(x)))  # Read file and parse image

dataset = dataset.batch(32)  # Batch images in groups of 32

dataset = dataset.prefetch(tf.data.AUTOTUNE)  # Prefetch batches for efficient pipeline

Explanation:
- `tf.data.Dataset.list_files()` generates a dataset of file paths matching the pattern.
- `map()` applies the `parse_fn` to decode and resize each image.
- `tf.io.read_file(x)` reads the raw file contents.
- `batch(32)` combines 32 images per batch to feed the model.
- `prefetch(tf.data.AUTOTUNE)` allows data to be prepared while the model is training, improving throughput.
This setup is useful for training image models efficiently in TensorFlow.

This example shows how to convert a TensorFlow Keras model to the TensorFlow Lite (TFLite) format for deployment on edge devices or mobile.

import tensorflow as tf


# Assume you have a trained or loaded Keras model

model = tf.keras.Sequential([...])  # Replace [...] with your model layers

model.compile(...)  # Compile the model with optimizer, loss, metrics


# Create a TFLite converter from the Keras model

converter = tf.lite.TFLiteConverter.from_keras_model(model)

tflite_model = converter.convert()  # Convert the model to TFLite format


# Save the converted TFLite model to a file

with open("model.tflite", "wb") as f:

    f.write(tflite_model)

Explanation:
- `tf.lite.TFLiteConverter.from_keras_model(model)` creates a converter object from your Keras model.
- `convert()` performs the actual conversion into a lightweight `.tflite` file.
- Writing the converted model to disk allows you to deploy it on devices that support TensorFlow Lite.
TensorFlow Lite models are optimized for mobile and embedded devices with reduced size and faster inference.

This example demonstrates how to convert a TensorFlow Keras model to TensorFlow Lite, save it, and briefly explains usage.

import tensorflow as tf


# Assume you have a trained or loaded Keras model

model = tf.keras.Sequential([...])  # Replace [...] with your model layers

model.compile(...)  # Compile the model with optimizer, loss, and metrics


# Convert the Keras model to TensorFlow Lite format

converter = tf.lite.TFLiteConverter.from_keras_model(model)

tflite_model = converter.convert()  # Perform conversion


# Save the TensorFlow Lite model to a file

with open("model.tflite", "wb") as f:

    f.write(tflite_model)

Explanation:
- This process allows you to deploy models on mobile or embedded devices with TensorFlow Lite runtime.
- The `.tflite` model is a lightweight version optimized for fast inference.
- To use this model for inference, you would load it with a TensorFlow Lite interpreter.

This example shows how to load a pretrained PyTorch ResNet18 model and export it to ONNX format for interoperability with other frameworks.

import torch.onnx



# Load pretrained ResNet18 model from PyTorch Hub

model = torch.hub.load('pytorch/vision', 'resnet18', pretrained=True)



# Create a dummy input tensor matching the model's expected input shape

dummy_input = torch.randn(1, 3, 224, 224)



# Export the model to ONNX format file named "resnet18.onnx"

torch.onnx.export(model, dummy_input, "resnet18.onnx")

Explanation:
- ONNX (Open Neural Network Exchange) is a common format that allows models to be used across different deep learning frameworks.
- The dummy input tensor is used to trace the model’s operations for export.
- Exported ONNX files can be run in runtimes such as ONNX Runtime or converted to other formats.

This example demonstrates how to prune (zero out) 30% of the weights randomly in a PyTorch linear layer to reduce model complexity and potentially improve generalization.

import torch.nn.utils.prune as prune

import torch.nn as nn



# Define a simple linear layer with 100 inputs and 10 outputs

model = nn.Linear(100, 10)



# Apply random unstructured pruning on the 'weight' parameter with 30% sparsity

prune.random_unstructured(model, name="weight", amount=0.3)



# Print the weights to observe pruning (some will be zero now)

print(model.weight)

Explanation:
- Pruning is a technique to remove less important parameters (weights) to compress the model.
- random_unstructured prunes weights randomly across the weight tensor.
- The 'amount=0.3' argument means 30% of weights will be set to zero.
- This can reduce model size and inference cost but may affect accuracy.

This example shows how to convert a pre-trained ResNet18 model from 32-bit floating point precision to a dynamically quantized 8-bit integer model to reduce model size and improve inference speed.

import torch.quantization



# Load a pretrained ResNet18 model (float32)

model_fp32 = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=True)

model_fp32.eval()



# Apply dynamic quantization to Linear layers

model_int8 = torch.quantization.quantize_dynamic(

    model_fp32, {torch.nn.Linear}, dtype=torch.qint8

)



# Print the quantized model summary

print(model_int8)

Explanation:
- Dynamic quantization converts certain layers (here Linear layers) to int8 precision.
- This reduces model size and speeds up CPU inference.
- It keeps other layers in float32.
- The model is first set to eval mode.
- This technique is especially useful for NLP and fully connected layers.

This example shows how to use the Hugging Face Transformers library to load a pretrained Vision Transformer (ViT) model and feature extractor, process an image, and perform image classification.

from transformers import ViTFeatureExtractor, ViTForImageClassification

from PIL import Image

import requests

import torch



# Load pretrained ViT model and feature extractor

model = ViTForImageClassification.from_pretrained("google/vit-base-patch16-224")

extractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224")



# Load image from URL

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/image_classification.jpeg"

image = Image.open(requests.get(url, stream=True).raw)



# Preprocess image and convert to PyTorch tensor batch

inputs = extractor(images=image, return_tensors="pt")



# Forward pass through model

outputs = model(**inputs)



# Get predicted class ID with highest logit score

print(outputs.logits.argmax(-1))

Explanation:
- `ViTFeatureExtractor` preprocesses images to the format ViT expects.
- The image is downloaded and opened using PIL.
- The model outputs logits (raw prediction scores) for each class.
- The class with the highest logit is selected as the predicted class.
- This approach is powerful for image classification using transformers.

To use YOLOv5, first clone the official GitHub repository and install its dependencies.

# Clone the YOLOv5 repository

git clone https://github.com/ultralytics/yolov5

cd yolov5

# Install Python dependencies

pip install -r requirements.txt

Explanation:
- `git clone` downloads the YOLOv5 source code.
- `cd yolov5` changes the directory to the cloned repo.
- `pip install -r requirements.txt` installs all necessary Python packages.
This prepares your environment for training or running YOLOv5 object detection models.

This example demonstrates how to load the small YOLOv5 model using PyTorch Hub and run inference on an image URL.

import torch

# Load YOLOv5 small pre-trained model

model = torch.hub.load('ultralytics/yolov5', 'yolov5s')


# Image URL for inference

img = 'https://ultralytics.com/images/zidane.jpg'


# Perform inference

results = model(img)


# Show image with detected bounding boxes

results.show()

Explanation:
- `torch.hub.load` loads the pre-trained YOLOv5 small model.
- The image is passed to the model for detection.
- `results.show()` displays the image with detection boxes drawn.
This is a simple way to run object detection using YOLOv5.

This code loads the ResNet50 model pretrained on ImageNet for feature extraction or fine-tuning.

from tensorflow.keras.applications import ResNet50


# Load ResNet50 model with pretrained weights

model = ResNet50(weights='imagenet')

Explanation:
- `ResNet50` is a deep CNN architecture.
- `weights='imagenet'` loads pretrained weights trained on the ImageNet dataset.
- This model can be used for image classification or as a base for transfer learning.

This example shows how to add Batch Normalization between Dense layers to help stabilize and speed up training.

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, BatchNormalization


model = Sequential([

    Dense(64, input_shape=(100,), activation='relu'),  # Dense layer with 64 units and ReLU activation

    BatchNormalization(),  # Normalize activations to improve training stability

    Dense(1, activation='sigmoid')  # Output layer for binary classification

])

Explanation:
- BatchNormalization normalizes layer inputs to have mean close to 0 and variance close to 1.
- This helps reduce internal covariate shift and can lead to faster convergence.
- Typically placed after activation or before next layer.

This example defines a simple Generative Adversarial Network (GAN) with a Generator and Discriminator in PyTorch.

import torch

import torch.nn as nn


# Generator network

class Generator(nn.Module):

    def __init__(self, input_dim, output_dim):

        super().__init__()

        self.net = nn.Sequential(

            nn.Linear(input_dim, 128),  # Fully connected layer from noise input to hidden layer

            nn.ReLU(),  # Activation function

            nn.Linear(128, output_dim),  # Output layer to generate data (e.g., image pixels)

            nn.Tanh()  # Output scaled between -1 and 1

        )

    def forward(self, x):

        return self.net(x)


# Discriminator network

class Discriminator(nn.Module):

    def __init__(self, input_dim):

        super().__init__()

        self.net = nn.Sequential(

            nn.Linear(input_dim, 128),  # Fully connected layer from input data to hidden layer

            nn.LeakyReLU(0.2),  # LeakyReLU activation to allow small gradients when inactive

            nn.Linear(128, 1),  # Output layer giving probability that input is real

            nn.Sigmoid()  # Sigmoid activation to output probability between 0 and 1

        )

    def forward(self, x):

        return self.net(x)


# Create instances

G = Generator(100, 784)  # Generator takes 100-dim noise vector, outputs 784-dim (e.g. 28x28 image)

D = Discriminator(784)  # Discriminator takes 784-dim input to classify real/fake

Explanation:
- Generator learns to create realistic data from noise.
- Discriminator learns to distinguish real vs generated data.
- Both are trained adversarially to improve generation quality.

This example shows how to use a pre-trained BERT model for text classification.

from transformers import BertTokenizer, BertForSequenceClassification

from torch.nn.functional import softmax


# Load pre-trained tokenizer and model

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')  # Tokenizer splits text into tokens

model = BertForSequenceClassification.from_pretrained('bert-base-uncased')  # Pre-trained BERT model


# Tokenize input text and convert to tensor format

inputs = tokenizer("I love deep learning!", return_tensors="pt")


# Forward pass through the model

outputs = model(**inputs)

probs = softmax(outputs.logits, dim=1)  # Apply softmax to get probabilities for each class


print(probs)  # Prints class probabilities

Explanation:
- `BertTokenizer` converts raw text into tokens understandable by BERT.
- `BertForSequenceClassification` is a BERT model with a classification head.
- The input is tokenized and batched (here batch size = 1).
- Model outputs logits (raw scores), which are converted to probabilities via softmax.
- This can be used for sentiment analysis, spam detection, etc.

This code defines a basic Transformer block used in models like BERT and GPT.

import torch
import torch.nn as nn

class TransformerBlock(nn.Module):

    def __init__(self, embed_size, heads, ff_hidden, dropout):

        super().__init__()

        self.attention = nn.MultiheadAttention(embed_size, heads)  # Multi-head self-attention

        self.norm1 = nn.LayerNorm(embed_size)  # Normalization after attention

        self.norm2 = nn.LayerNorm(embed_size)  # Normalization after feed-forward

        self.feed_forward = nn.Sequential(

            nn.Linear(embed_size, ff_hidden),  # First linear layer in feed-forward

            nn.ReLU(),                         # Activation

            nn.Linear(ff_hidden, embed_size)  # Second linear layer back to embed size

        )

        self.dropout = nn.Dropout(dropout)  # Dropout for regularization


    def forward(self, x):

        attn_output, _ = self.attention(x, x, x)  # Self-attention on input

        x = self.norm1(attn_output + x)  # Add & Norm

        ff_output = self.feed_forward(x)  # Feed-forward network

        x = self.norm2(ff_output + x)  # Add & Norm

        return x

Explanation:
- **MultiheadAttention:** Allows the model to jointly attend to information from different representation subspaces.
- **LayerNorm:** Normalizes the inputs to stabilize and accelerate training.
- **Feed-forward:** Two-layer fully connected network applied independently to each position.
- **Residual connections:** Adding input (`x`) to outputs to help gradient flow.
- This block is a fundamental component of Transformer architectures.

This code builds a simple Convolutional Neural Network (CNN) for image classification with 10 classes.

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense



model = Sequential([

    Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)),  # 32 filters, 3x3 kernel, input shape RGB images 64x64

    MaxPooling2D(pool_size=(2, 2)),  # Downsamples spatial dims by 2



    Conv2D(64, (3, 3), activation='relu'),  # 64 filters, 3x3 kernel

    MaxPooling2D(pool_size=(2, 2)),  # Another downsampling layer



    Flatten(),  # Flattens 2D feature maps into 1D vector

    Dense(128, activation='relu'),  # Fully connected layer with 128 units

    Dense(10, activation='softmax')  # Output layer for 10 classes with probabilities

])

Explanation:
- **Conv2D layers:** Extract spatial features using learnable filters.
- **MaxPooling2D layers:** Reduce spatial dimensions and help in translation invariance.
- **Flatten:** Converts 2D feature maps into a 1D vector for Dense layers.
- **Dense layers:** Perform classification based on extracted features.
- **Softmax activation:** Outputs class probabilities for 10 categories.
This is a common architecture pattern for beginner CNNs on small image datasets.

This is a simple neural network with dropout to reduce overfitting.

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, Dropout



model = Sequential()

model.add(Dense(128, activation='relu', input_shape=(100,)))  # Input layer with 128 neurons and ReLU activation

model.add(Dropout(0.5))  # Dropout layer randomly disables 50% of neurons during training to prevent overfitting

model.add(Dense(1, activation='sigmoid'))  # Output layer with 1 neuron and sigmoid activation for binary classification

Explanation:
- The **Dense(128)** layer is fully connected and uses ReLU for non-linearity.
- **Dropout(0.5)** randomly turns off half of the neurons during each training step, forcing the model to learn more robust features.
- The last **Dense(1)** layer outputs a probability with sigmoid, suitable for binary classification tasks.
This simple architecture is commonly used in binary classification problems with tabular or vector input.

This code demonstrates transfer learning using the VGG16 model as a fixed feature extractor.

from tensorflow.keras.applications import VGG16

from tensorflow.keras.models import Model

from tensorflow.keras.layers import Dense, Flatten



# Load VGG16 without the fully connected layers on top

base_model = VGG16(include_top=False, input_shape=(224, 224, 3))



# Freeze the convolutional base to prevent training

for layer in base_model.layers:

    layer.trainable = False



# Add new classifier layers on top

x = Flatten()(base_model.output)  # Flatten feature maps to 1D vector

x = Dense(64, activation='relu')(x)  # Fully connected layer with ReLU activation

output = Dense(1, activation='sigmoid')(x)  # Single output neuron with sigmoid for binary classification



# Create the new model

model = Model(inputs=base_model.input, outputs=output)

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Explanation:
- VGG16 is loaded without its original classification head (`include_top=False`).
- All layers in the base model are frozen to retain pretrained weights.
- A new fully connected head is added to adapt the model for a binary classification task.
- The model is compiled with Adam optimizer and binary cross-entropy loss.
This approach leverages powerful pretrained features and trains only the new classifier layers.

This code defines and uses the Swish activation function as a custom activation in Keras.

from tensorflow.keras import backend as K

from tensorflow.keras.layers import Activation

from tensorflow.keras.utils import get_custom_objects



# Define the Swish activation function

def swish(x):

    return x * K.sigmoid(x)  # Swish = x * sigmoid(x)



# Register Swish as a custom activation for use in models

get_custom_objects().update({'swish': Activation(swish)})



# Example: Use Swish activation in a simple model

model = Sequential()

model.add(Dense(32, input_shape=(10,), activation='swish'))  # Dense layer with Swish activation

Explanation:
- `swish` function implements the Swish activation, which often outperforms ReLU.
- It multiplies input by its sigmoid.
- `get_custom_objects().update` registers it globally so you can use `'swish'` as an activation string.
- The example shows how to add a Dense layer using Swish.
This makes custom activations easy to integrate in Keras workflows.

This example builds and trains a simple feedforward neural network in Keras.

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense



# Create a sequential model

model = Sequential()

model.add(Dense(32, input_shape=(10,), activation='relu'))  # Input layer with 10 features

model.add(Dense(16, activation='relu'))                     # Hidden layer with 16 neurons

model.add(Dense(1, activation='sigmoid'))                   # Output layer for binary classification



# Compile the model with Adam optimizer and binary crossentropy loss

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])



# Generate dummy input data: 100 samples, 10 features each

import numpy as np

X = np.random.rand(100, 10)

# Generate dummy binary labels for classification

y = np.random.randint(2, size=(100, 1))



# Train the model for 5 epochs

model.fit(X, y, epochs=5)

Explanation:
- The model has 3 layers: input, hidden, and output.
- `relu` activations in input and hidden layers introduce non-linearity.
- Output layer uses `sigmoid` for binary classification (output between 0 and 1).
- `binary_crossentropy` loss is suitable for binary targets.
- Dummy data simulates random inputs and labels for quick testing.
- `model.fit()` trains the model on the data.

Deep Learning Interview Questions

Beginners To Experts

The site is under development.

Deep Learning Interview Questions

Deep Learning Interview Questions

Beginners To Experts

The site is under development.

Deep Learning Interview Questions

What is deep learning and how does it differ from traditional machine learning?

What is a neural network and what are its main components?

What is backpropagation and why is it important?

What are activation functions and name a few common ones?

What is overfitting in deep learning and how can it be prevented?

What is a convolutional neural network (CNN)?

What is a recurrent neural network (RNN) and what problems does it solve?

What is the vanishing gradient problem?

What is dropout and how does it help neural networks?

What is batch normalization and why is it used?

What is transfer learning in deep learning?

What is the difference between supervised and unsupervised deep learning?

What is an autoencoder and what are its applications?

What is gradient descent and how does it work in deep learning?

What is the role of loss functions in deep learning?

How does early stopping prevent overfitting?

What is the purpose of the optimizer in deep learning?

What is the difference between batch and stochastic gradient descent?

What is a loss landscape in deep learning?

What are generative adversarial networks (GANs)?

What is transfer learning in deep learning?

What is dropout and how does it help?

What is batch normalization?

What is the vanishing gradient problem?

How do convolutional neural networks (CNNs) work?

What are recurrent neural networks (RNNs)?

What are LSTM networks?

What is a Transformer model?

What is self-attention in Transformers?

What is attention mechanism in deep learning?

What is fine-tuning in transfer learning?

What is data augmentation and why is it used?

What is gradient clipping?

What is early stopping in model training?

What is the role of epochs in training?

What is overfitting and how to prevent it?

What is a learning rate and why is it important?

What is regularization in deep learning?

What is the difference between batch size and mini-batch?

What is the role of the optimizer in training?

What is a loss function?

What is the difference between supervised and unsupervised learning?

What is reinforcement learning?

What is the difference between AI, machine learning, and deep learning?

What is the curse of dimensionality?

What is an autoencoder?

What is a GAN (Generative Adversarial Network)?

What is reinforcement learning policy?

What is backpropagation?

What is a confusion matrix?

What is overfitting in machine learning?

How to prevent overfitting?

What is underfitting?

What is regularization in machine learning?

What is cross-validation?

What is bias-variance tradeoff?

What are activation functions in neural networks?

What is the vanishing gradient problem?

What is dropout in neural networks?

What is batch normalization?

What is a convolutional neural network (CNN)?

What is pooling in CNNs?

What is a recurrent neural network (RNN)?

What is the difference between RNN and LSTM?

What is a transformer model?

What is attention in deep learning?

What is transfer learning?

What is fine-tuning in transfer learning?

What is data augmentation?

What is the difference between classification and regression?

What is precision, recall, and F1 score?

What is ROC curve and AUC?

What is gradient descent?

What is stochastic gradient descent (SGD)?

What are optimizers in deep learning?

What is early stopping?