The Malaysian startup is laying the groundwork for enhancing the accuracy of machine-translations that LLMs from Western countries cannot deliver
When large language models (LLM) that are pre-trained in English and western-centric data are tasked with non-English queries such as those in South-east Asian languages, they can generate inaccuracies and misinterpretations.
On 4 Dec 2024, a Malaysia startup announced LLMs that are trained on culturally diverse data in order to boost translation accuracy.
Trained on 197 datasets totaling close to 200bn tokens of publicly available Malay-specific content, Mesolitica’s LLM can understand local nuances such as slang (colloquialisms that merge different dialects), Bahasa Malayu, and 16 other regional languages for use in AI assistants across industries. This can offer culturally-relevant AI support for applications in customer service, content generation, and data analysis in SEA localized languages.
The startup had earlier undergone a four-week accelerator program by Amazon Web Services aimed at supporting early-stage startups specializing in developing generative AI (GenAI) applications. Leveraging this support, Mesolitica was able to develop its compute-intensive LLM on the cloud economically. The compute cost savings are estimated at 87%, and throughput has been boosted by 5.5-fold.
According to the firm’s co-founder and CEO, Khalil Nooh, “we can deploy proofs-of-concept much faster, with the right cost-effective AI compute resources and machine learning capabilities. This allows our customers to focus only on the ongoing operational costs, rather than upfront capital expenses, for their AI experiments. This is also in line with Malaysia’s national priority to develop citizen-centric applications.”
The SEA-localized LLM can be used to improve operations in, among many others:
- underserved groups such as farmers in rural areas: to make data-driven decisions using real-time weather forecasts, soil health analysis, and crop viability assessments
- the education sector: to enhance the understanding of local languages and dialects
- Malaysia’s public sector: for providing better communications and quick, accurate responses to citizens’ inquiries in multiple languages, including dialects from different Malaysian States
Said Pete Murray, Country Manager, AWS Malaysia: “For GenAI to be relevant, it must be accessible and culturally integrated. Mesolitica is creating Malaysia’s first LLM that’s tailored to the country’s diverse population. This has potential to support various sectors, from improving government services to financial inclusion.”