AI models

Not long ago, neural models in most domestic companies were developed manually. Back in the old days of 2018, I did it too. Today, models are developed by large teams in large companies, which cannot b matched due to the resources expended. Just training a new ChatGPT costs millions of dollars. The situation has changed. But which to choose?

Language models

The world has been getting used to the GPT model from OpenAI for some time now. However, it is not the only neural network used for language work. For more specific tasks in language, such as name recognition or sentiment detection, BERT may be more suitable.

Image models

If we want to detect objects from images, which are plentiful on the internet, we can use one of Google's models that have been incrementally developed since the last deep learning revolution. For example, EfficientNet, with scalable accuracy and computational demands.

For object detection in video, the YOLO project still serves well. A few years ago, my team used it at a dev hackathon to create a drone-alcoholic that flew around the room searching for beer bottles. The only limits here are human imagination.

Generative models

They do not primarily recognize but generate their own content based on input. Even ChatGPT is a generative model, but LLMs are in their own category due to their popularity. Therefore, I will mention image, sound, or entire video here. In the field of the former, MidJourney and DALL-E lead today. For video, Sora is currently in beta testing.

Speech models

OpenAI not only shone with GPT but is also known in the IT community for Whisper. It converts speech to text with relatively high accuracy. Its relative shadow is only due to OpenAI not providing an API, so companies have to deploy the model themselves. However, its accuracy often surpasses even human listeners.

Going the other way, from text to speech is relatively simpler. There are many tools and APIs, such as ElevenLabs. Recently, voice cloning has also shown promise, although it is somewhat overshadowed by highlighted controversies.

The Model Is Not Everything

Even with a tested and functioning model, the correctness of the input data is critical, and mistakes are often made there. I once experienced the following situation. One of the engineers was sending input data to a well-functioning model by transforming it with a base-2 logarithm. However, the input description clearly stated the need for a base-10 logarithm. The engineer had read it but simply did not know the command for it, so he chose the wrong one. The service did not work well, and it took some time to figure out why. It was this small, easily overlooked single line of code.

Conclusion

Have you found an interesting use-case for your company? Even with third-party models, it is necessary to follow a clear process, provide documentation, and ideally prepare some usage examples. Use the model according to the given domain: a smoke detection problem will probably not be solved by a network for automatic farming. If we agree, I can ensure these requirements are met. This will help the organization avoid unnecessary troubles associated with the incorrect use of models.