image_reco module¶
Senju Image Recognition Module¶
A module providing image description generation capabilities for the Senju haiku application.
This module leverages pre-trained vision-language models (specifically BLIP) to generate textual descriptions of uploaded images. These descriptions can then be used as input for the haiku generation process, enabling image-to-haiku functionality.
Classes¶
- ImageDescriptionGenerator
The primary class responsible for loading the vision-language model and generating descriptions from image data.
Functions¶
- gen_response
A helper function that wraps the description generation process for API integration.
Dependencies¶
torch: Deep learning framework required for model operations
PIL.Image: Image processing capabilities
io: Utilities for working with binary data streams
transformers: Hugging Face’s library providing access to pre-trained models
Implementation Details¶
The module initializes a BLIP model (Bootstrapped Language-Image Pre-training) which can understand visual content and generate natural language descriptions. The implementation handles image loading, preprocessing, model inference, and post-processing to return structured description data.
- class image_reco.ImageDescriptionGenerator(model_name='Salesforce/blip-image-captioning-base')¶
Bases:
object
A class for generating textual descriptions of images using a vision-language model.
This class handles the loading of a pre-trained BLIP model, image preprocessing, and caption generation. It provides an interface for converting raw image data into natural language descriptions that can be used for haiku inspiration.
- Variables:
processor – The BLIP processor for handling image inputs
model – The BLIP model for conditional text generation
device – The computation device (CUDA or CPU)
- generate_description(image_data, max_length=50)¶
Generate a descriptive caption for the given image.
This method processes the raw image data, runs inference with the BLIP model, and returns a structured response with the generated description.
- Parameters:
image_data (bytes) – Raw binary image data
max_length (int) – Maximum token length for the generated caption
- Returns:
Dictionary containing the generated description and confidence score
- Return type:
dict
- image_reco.gen_response(image_data) dict ¶
Generate a description for an image using the global description generator.
This function provides a simplified interface to the image description functionality for use in API endpoints.
- Parameters:
image_data (bytes) – Raw binary image data
- Returns:
Dictionary containing the image description and confidence information
- Return type:
dict
- Raises:
Exception – If image processing or description generation fails