Large Language Model Fine-tuning Method Based on Adaptive Quantization
Keywords:
Large language model, AI, ADAQ-LoRAAbstract
In recent years, large language models (LLMs) have excelled in comprehensive AI tasks such as language text generation, mathematics, abstraction, and code, and people have seen the embryonic form of general artificial intelligence. However, the fine tuning of the model also needs to consume a lot of computer memory, and the computing resources are extremely high, which is difficult to meet the general consumer grade graphics card. Therefore, an adaptive quantized low-rank (ADAQ-LoRA) fine-tuning algorithm is proposed to solve the problem of video memory consumption during fine-tuning of large language models. The solution is to use both quantification and pruning methods to dramatically reduce video memory usage without losing accuracy. ADAQ-LoRA is applied to ChatGLM2-6B model and its effectiveness is verified in different fine-tuning datasets and downstream scenarios. Compared with the existing large language model fine-tuning methods, ADAQ-LoRA shows better performance and lower memory usage.