Deep learning models often require days, weeks, or even months to train. They may use large datasets, GPU resources, and multiple tuning cycles before achieving the desired performance. Once a model is trained, the ability to save and reload it becomes crucial for many reasons: preserving progress, sharing results with others, deploying models into real-world applications, and building robust experimentation pipelines.
Keras, the high-level API of TensorFlow, provides powerful and flexible tools to save entire models, model weights, architectures, or the training configuration. Whether you’re training a simple model or a complex deep neural network, understanding how to correctly save and load your work is an essential skill.
This word guide covers everything you need to know about saving and loading models in Keras: formats, techniques, best practices, real-world workflows, common errors, solutions, and advanced use cases. By the end of this article, you will be able to confidently preserve and restore any model you build.
1. Introduction Why Saving Models Matters
In the lifecycle of machine learning development, saving models is not optional. It is mandatory.
When training deep learning models:
- Training may take hours or days.
- You may experiment with multiple architectures.
- You want to avoid losing progress due to system or runtime interruptions.
- You need reusable models for inference in APIs and applications.
- You must keep checkpoints for resuming training later.
- You may need models for A/B testing or production deployment.
Without saving, each training session would begin from scratch. This is impractical and wasteful.
Saving a model allows you to:
- Reuse it later
- Deploy it anywhere
- Share it with teammates
- Continue training from the last checkpoint
- Convert it to other formats (TF Lite, ONNX, CoreML)
Thus, saving and loading models is foundational to every deep learning project.
2. How Keras Handles Model Saving
Keras provides two major saving formats:
- The native Keras format (
.keras) – recommended for most users. - TensorFlow SavedModel format – default format used by TensorFlow for production and serving.
Both formats support saving:
- Model architecture
- Model weights
- Training configuration
- Optimizer state
This means you can continue training exactly where you left off.
3. The Native Keras Format (.keras)
Introduced in Keras 3, the .keras format is the official recommended format for saving complete Keras models.
Key Features
- Single-file format
- Human-readable metadata
- Stores architecture + weights + optimizer state
- Compact and efficient
- Ideal for most workflows
Saving a Model
model.save("my_model.keras")
Loading a Model
from tensorflow.keras.models import load_model
model = load_model("my_model.keras")
This is the simplest and most reliable method.
4. TensorFlow SavedModel Format
The SavedModel format is TensorFlow’s standard for serialization. It stores models in a directory containing:
- A protocol buffer file
- Variables and metadata files
- Signatures used for serving
Saving a Model
model.save("saved_model_dir")
Loading a Model
model = load_model("saved_model_dir")
Use Cases
- TensorFlow Serving
- Distributed deployment
- Cross-language support (Java, C++, Go)
- Export for TF Lite or TensorFlow.js
It is the industry standard for enterprise deployments.
5. Saving Only Weights
Sometimes you want to save just the weights, not the architecture. This is common when:
- The model structure is defined in code
- Only updating weights between versions
- Implementing custom architectures
Saving Weights
model.save_weights("weights.h5")
Loading Weights
model.load_weights("weights.h5")
Important
You must rebuild the exact same architecture before loading weights.
6. Saving Model Architecture Only
If you only want to store the architecture (not weights or optimizer), you can save as JSON or YAML.
JSON Format
json_string = model.to_json()
Restore via:
from tensorflow.keras.models import model_from_json
model = model_from_json(json_string)
YAML Format
(YAML support may require backend support.)
This is useful for architecture sharing.
7. The save() Method in Detail
The model.save() function saves the entire model, including:
- Layers
- Weights
- Loss functions
- Optimizer state
- Custom objects
Syntax
model.save(filepath)
Accepted formats
"model.keras"→ Native Keras format"model"folder → SavedModel format
What is included?
- Everything needed to resume training
- Everything needed to perform inference
8. The load_model() Method in Detail
Restoring a model is simple:
from tensorflow.keras.models import load_model
model = load_model("my_model.keras")
What happens during load?
- Architecture is reconstructed
- Weights are reloaded
- Optimizer state is restored
- Custom losses or layers are loaded (if provided)
You can immediately:
- Evaluate
- Predict
- Continue training
9. Saving Checkpoints During Training
Training large models for long durations increases the risk of interruptions. Keras provides the ModelCheckpoint callback, which saves the model at intervals.
Example
from tensorflow.keras.callbacks import ModelCheckpoint
checkpoint = ModelCheckpoint(
"checkpoint.keras",
monitor="val_loss",
save_best_only=True
)
model.fit(x, y, epochs=20, callbacks=[checkpoint])
What this does
- Saves the model only when validation loss improves
- Creates robust training pipelines
- Prevents losing progress
Checkpoints are essential for:
- Long training jobs
- Experiments with unstable loss curves
- Cloud-based training
10. Saving After Every Epoch
You can configure checkpoints to save after each epoch:
checkpoint = ModelCheckpoint(
"model_epoch_{epoch}.keras",
save_freq="epoch"
)
This creates versions like:
model_epoch_1.kerasmodel_epoch_2.keras- …
Useful for analyzing training evolution.
11. Saving Weights Only During Checkpoints
To save only weights:
checkpoint = ModelCheckpoint(
"weights_epoch_{epoch}.h5",
save_weights_only=True
)
This is faster and saves disk space.
12. Saving Custom Models and Layers
Custom layers, activations, or loss functions require special handling when loading.
Example
def custom_activation(x):
return x * tf.nn.relu(x)
Saving
No special action needed.
Loading
model = load_model("model.keras", custom_objects={"custom_activation": custom_activation})
13. Exporting Models for Production
Keras models can be deployed into:
13.1 TensorFlow Serving
Use SavedModel format.
13.2 TensorFlow Lite
Convert for mobile or embedded devices.
13.3 TensorFlow.js
Convert for web applications.
13.4 ONNX
Export for cross-framework compatibility.
13.5 REST APIs
Using frameworks like:
- Flask
- FastAPI
- Django
Saving models correctly ensures portability across platforms.
14. Saving Training History
To track accuracy and loss over time, save the training history:
history = model.fit(...)
import json
with open("history.json", "w") as f:
json.dump(history.history, f)
This enables:
- Visualization
- Comparison of experiments
- Debugging training issues
15. Versioning Models
For serious ML workflows, versioning is crucial.
Ways to version:
- Timestamp folders
- Git LFS
- MLflow
- DVC
- Weights & Biases
Versioning prevents confusion when comparing multiple model iterations.
16. Best Practices for Saving Models
16.1 Always Save Final Model
After training completes:
model.save("final_model.keras")
16.2 Use Checkpoints
Never rely on one save point.
16.3 Save Architecture and Hyperparameters
Keep everything reproducible.
16.4 Use Keras Format for Simplicity
.keras is compact and recommended.
16.5 Use SavedModel for Deployment
Especially for large applications.
16.6 Keep Metadata
Save:
- Learning rate
- Batch size
- Dataset version
16.7 Document Everything
Future-proof your work.
17. Common Errors and How to Fix Them
17.1 “Unknown layer” when loading
Solution:
custom_objects={"MyLayer": MyLayer}
17.2 Shape mismatch errors
Cause: Architecture differs when loading weights.
17.3 Missing optimizer state
Occurs when saving weights only.
17.4 Unsupported format
Use .keras or SavedModel.
18. Real-World Use Cases
18.1 Training long models overnight
Checkpoints ensure progress isn’t lost.
18.2 Running experiments
Save dozens of model versions to compare.
18.3 Deployment pipelines
SavedModel is used in production APIs.
18.4 Transfer learning
Load pretrained weights:
model.load_weights("imagenet_weights.h5")
18.5 Edge and mobile applications
Convert SavedModel → TFLite.
19. Example: Full Save + Load Workflow
Training and Saving
model.fit(x, y, epochs=10)
model.save("full_model.keras")
Loading
model = load_model("full_model.keras")
result = model.predict(new_data)
20. Example: Checkpoints + Resume Training
During training
checkpoint = ModelCheckpoint("chk.keras", save_best_only=True)
model.fit(x, y, epochs=50, callbacks=[checkpoint])
Later
model = load_model("chk.keras")
model.fit(x, y, epochs=20)
Leave a Reply