Since ChatGPT arrived in late 2022, giant language fashions (LLMs) have continued to lift the bar for what generative AI programs can accomplish. For instance, GPT-3.5, which powered ChatGPT, had an accuracy of 85.5% on frequent sense reasoning knowledge units, whereas GPT-4 in 2023 achieved round 95% accuracy on the identical knowledge units. Whereas GPT-3.5 and GPT-4 primarily targeted on textual content processing, GPT-4o — launched in Might of 2024 — is multi-modal, permitting it to deal with textual content, photos, audio and video.
Regardless of the spectacular developments by the GPT household of fashions and different open-source giant language fashions, Gartner, in its hype cycle for synthetic intelligence in 2024, notes that “generative AI has handed the height of inflated expectations, though hype about it continues.” Some causes for disillusionment embody the excessive prices related to the GPT household of fashions, privateness and safety issues relating to knowledge, and points with mannequin transparency. Small language fashions with fewer parameters than these LLMs are one potential resolution to those challenges.
Smaller language fashions are simpler and more cost effective to coach. Moreover, smaller fashions might be hosted on-premises, offering higher management over the shared knowledge with these language fashions. One problem with smaller fashions is that they are usually much less correct than their bigger counterparts. To harness the strengths of smaller fashions whereas mitigating their weaknesses, enterprises are domain-specific small fashions, which should be correct solely within the specialization and use circumstances they assist. This area specialization might be enabled by taking a pre-trained small language mannequin and fine-tuning it with domain-specific knowledge or utilizing immediate engineering for added efficiency positive factors.