The surge of multilingual audio datasets has changed the way AI training is done, language is learned, and indeed, data is used in Science. Be it training AI models or communicating seamlessly with language speakers across language barriers, these datasets are among the core assets of the technological system. But what are multilingual datasets exactly? Why do they have such significance? In what way can organizations such as Macgence be instrumental in the development and application of such datasets?