image_reco module

Senju Image Recognition Module

A module providing image description generation capabilities for the Senju haiku application.

This module leverages pre-trained vision-language models (specifically BLIP) to generate textual descriptions of uploaded images. These descriptions can then be used as input for the haiku generation process, enabling image-to-haiku functionality.

Classes

ImageDescriptionGenerator

The primary class responsible for loading the vision-language model and generating descriptions from image data.

Functions

gen_response

A helper function that wraps the description generation process for API integration.

Dependencies

  • torch: Deep learning framework required for model operations

  • PIL.Image: Image processing capabilities

  • io: Utilities for working with binary data streams

  • transformers: Hugging Face’s library providing access to pre-trained models

Implementation Details

The module initializes a BLIP model (Bootstrapped Language-Image Pre-training) which can understand visual content and generate natural language descriptions. The implementation handles image loading, preprocessing, model inference, and post-processing to return structured description data.

class image_reco.ImageDescriptionGenerator(model_name='Salesforce/blip-image-captioning-base')

Bases: object

A class for generating textual descriptions of images using a vision-language model.

This class handles the loading of a pre-trained BLIP model, image preprocessing, and caption generation. It provides an interface for converting raw image data into natural language descriptions that can be used for haiku inspiration.

Variables:
  • processor – The BLIP processor for handling image inputs

  • model – The BLIP model for conditional text generation

  • device – The computation device (CUDA or CPU)

generate_description(image_data, max_length=50)

Generate a descriptive caption for the given image.

This method processes the raw image data, runs inference with the BLIP model, and returns a structured response with the generated description.

Parameters:
  • image_data (bytes) – Raw binary image data

  • max_length (int) – Maximum token length for the generated caption

Returns:

Dictionary containing the generated description and confidence score

Return type:

dict

image_reco.gen_response(image_data) dict

Generate a description for an image using the global description generator.

This function provides a simplified interface to the image description functionality for use in API endpoints.

Parameters:

image_data (bytes) – Raw binary image data

Returns:

Dictionary containing the image description and confidence information

Return type:

dict

Raises:

Exception – If image processing or description generation fails