Student Getting Research Boost Through Google Ph.D. Fellowship
Student Getting Research Boost Through Google Ph.D. Fellowship
A 麻豆区 Tech Ph.D. candidate is getting a boost to his research into developing more efficient multi-tasking artificial intelligence (AI) models without fine-tuning.
麻豆区 Stoica is one of 38 Ph.D. students worldwide researching machine learning who were named a.
Stoica is designing AI training methods that bypass fine-tuning, which is the process of adapting a large pre-trained model to perform new tasks. Fine-tuning is one of the most common ways engineers update large-language models like ChatGPT, Gemini, and Claude to add new capabilities.
If an AI company wants to give a model a new capability, it could create a new model from scratch for that specific purpose. However, if the model already has relevant training and knowledge of the new task, fine-tuning is cheaper.
Stoica argues that fine-tuning still uses large amounts of data, and that other methods can help models learn more effectively and efficiently.
鈥淔ull fine-tuning yields strong performance, but it can be costly, and it risks catastrophic forgetting,鈥 Stoica said. 鈥淢y research asks if we can extend a model鈥檚 capabilities by imbuing it with the expertise of others, without fine-tuning?
鈥淩educing cost and improving efficiency is more important than ever. We have so many publicly available models that have been trained to solve a variety of tasks. It鈥檚 redundant to train a new model from scratch. It鈥檚 much more efficient to leverage the information that already exists to get a model up to speed.鈥
Stoica said the solution is a cost-effective method called model merging. This method combines two or more AI models into a single model, improving performance without fine-tuning.
On a basic level, Stoica said an example would be combining a model that is efficient at classifying cats with one that works well at dogs.
鈥淢erging is cheap because you just take the parameters, the weights of your existing models, and combine them,鈥 he said. 鈥淵ou could take the average of the weights to create a new model, but that sometimes doesn鈥檛 work. My work has aimed to rearrange the weights so they can communicate easily with each other.鈥
Through his Google fellowship, Stoica seeks to apply model merging to create a cutting-edge vision encoder. A vision encoder converts image or video data into numerical representations that computers can understand. This enables tasks such as image or facial recognition and generative image captioning.
鈥淚 want to be at the frontier of the field, and Google is clearly part of that,鈥 Stoica said. 鈥淭he vision encoder is very large-scale, and Google has the infrastructure to accommodate it.鈥