Large Language Model Fine-tuning Method Based on Adaptive Quantization

Authors

  • Lijuan Feng Zhengzhou University of Science and Technology
  • Jiaxiang Wang Zhengzhou University of Science and Technology
  • Jiangjiang Li Zhengzhou University of Science and Technology
  • Yachao Zhang Zhengzhou University of Science and Technology

Keywords:

Large language model, AI, ADAQ-LoRA

Abstract

In recent years, large language models (LLMs) have excelled in comprehensive AI tasks such as language text generation, mathematics, abstraction, and code, and people have seen the embryonic form of general artificial intelligence. However, the fine tuning of the model also needs to consume a lot of computer memory, and the computing resources are extremely high, which is difficult to meet the general consumer grade graphics card. Therefore, an adaptive quantized low-rank (ADAQ-LoRA) fine-tuning algorithm is proposed to solve the problem of video memory consumption during fine-tuning of large language models. The solution is to use both quantification and pruning methods to dramatically reduce video memory usage without losing accuracy. ADAQ-LoRA is applied to ChatGLM2-6B model and its effectiveness is verified in different fine-tuning datasets and downstream scenarios. Compared with the existing large language model fine-tuning methods, ADAQ-LoRA shows better performance and lower memory usage.

Downloads

Published

2024-12-24

How to Cite

Feng, L., Wang, J., Li, J., & Zhang, Y. (2024). Large Language Model Fine-tuning Method Based on Adaptive Quantization. IJLAI Transactions on Science and Engineering, 2(4), 65–71. Retrieved from http://ijlaitse.com/index.php/site/article/view/58

Most read articles by the same author(s)