A look behind the scenes
We affectionately call it nerd content - the exciting, complex topics our developers are passionate about - features we've developed, insights we've gained, or special features we've discovered. Enjoy reading!
Our expertise lies not only in implementing Magnolia, but also in adapting it to the specific needs of our customers and continuously developing it further. This flexibility enables us to create innovative and efficient applications that offer our customers long-term added value.
Markus Hesper, Partner dev5310
GPT4 Image Recognizer within Magnolia DXP
Enhancing Magnolia's image recognition with AI
The DEV5310 GPT-4 Vision Image Recognition module builds on the Magnolia CMS Image Recognition module and extends its functionality through integration with OpenAI's GPT-4 Vision model. The GPT-4 Vision model offers improved accuracy and an expanded knowledge base that enables comprehensive image recognition.
Setup
Before you get started with the dev5310-gpt4-vision-image-recognition module, you need to make sure that the Magnolia CMS Image Recognition module has been installed. This serves as the basis on which the new module works.
Once this is done, as with any typical module setup, you need to add the dev5310-gpt4-vision-image-recognition dependency to the Maven pom.xml file. Magnolia CMS is then configured to recognize the new module as the central service for all image recognition requirements.
Using OpenAI's Vision model
One of the highlights of the dev5310 GPT-4 Vision Image Recognition module is the implementation of OpenAI's GPT-4 Vision model. The module communicates seamlessly with the AI model to tag the input images with relevant labels. However, an API key is required for this interaction with OpenAI, and the flexibility offered here is commendable. Whether you want to store the API key in an environment variable, directly in the module configuration, or via the Magnolia Passwords app, the module is designed to support all of these methods. It ensures a streamlined setup process, regardless of the method you choose.
Conclusion
In summary, the dev5310 GPT-4 Vision Image Recognition module is a significant extension of the functionality built into Magnolia CMS. By integrating with the power of OpenAI's GPT-4, the module brings automated and accurate image recognition to your CMS platform. It is easy to set up, easy to configure, and effectively extends your image recognition capabilities. Get ready to discover the power of AI-powered image recognition like never before!
The dev5310Gpt4ImageRecogniser class
In the world of artificial intelligence, one of the trending topics is image recognition. More specifically, the model that OpenAI has made available for public use under the name GPT-4 Vision. This article introduces the Dev5310Gpt4ImageRecogniser class, which uses this model for image recognition.
How does the dev5310Gpt4ImageRecogniser work?
The dev5310Gpt4ImageRecogniser implements the ImageRecogniser interface. As the name suggests, an ImageRecogniser is responsible for interpreting images and returning their tags. In this particular implementation, the GPT-4 Vision model performs this task.
java public class Dev5310Gpt4ImageRecogniser implements ImageRecogniser { ... }
The class uses a JSON string API_MESSAGE_BODY_TEMPLATE to define the payload for the OpenAI API. The payload contains instructions for the assistant and an image to be tagged. The image is encoded in Base64 and sent to the OpenAI image recognition assistant along with a request to tag the image.
An instance of Dev5310OpenAICredentialsProvider is set up in the class, which is responsible for retrieving the necessary credentials for interacting with the OpenAI service.
Image recognition method
The core functionality of Dev5310Gpt4ImageRecogniser is encapsulated in its recognition method:
java @Override public Collection recognise(final byte[] imageBytes) {...}
This function takes a byte array representing an image as input, calls the OpenAI API, and returns a collection of ImageLabel objects representing the tags for the recognized image. ImageLabel objects are essentially representations of the recognized labels of the image.
If the OpenAI service responds with an HTTP status code other than 200, recognition is skipped and the error is logged. If the response is successful, a GptResponse object is created, the response is parsed, converted to tags, and returned as a collection of ImageLabel objects.
Additional functionality
In addition to the recognition method, the class has several helper methods that are all used to process and prepare the image data:
_
getBodyImageInfo: Performs preliminary image checks—whether the image is recognized, its type, size, MIME type—and generates the message text for the OpenAI API.
_
getBodyImageIO: Generates a base64 representation of the specified format and dimensions of the image.
_
scaleImageToFit: This function is used to resize the image while maintaining the original aspect ratio.
This class provides an example of interacting with image recognition models such as GPT-4 Vision, provided by OpenAI. It is straightforward and provides an elegant way to communicate with an image processing API, process images, and return their labels.