Computer Science Thesis Defense - Zhenhan Huang

PhD Candidate: Zhenhan Huang 

Advisor: Prof. Jianxi Gao 

Meta Learning of Deep Learning Models Through Architecture Design and Optimization

 

Abstract:

    Deep learning, such as large language model and image generative model, has transformed modern society. Due to the scaling law, the performance of deep learning models cannot be infinitely boosted by scaling up the model complexity. We focus on two key aspects of deep learning that are fundamental to the performance: neural architecture design and training scheme. The former determines the expressivity of the model while the latter exploits the potential of the model through optimization. Instead of directly delving into neural network interpretability which is challenging for complicated deep learning models due to the inherent non-linearity feature, we analyze the dynamics and topology of neural networks through the lens of graphs. There are two advantages of introducing graphs. The first benefit is that well-established graph theories provide a theoretically solid understanding of neural architectures from the graph perspective. The second benefit is that the computational costs for computing graph measures are negligible compared to forward and backward propagations in deep learning models. Hence, we can analyze large-scale models at ease and the analysis is highly efficient. We evaluate the proposed methods on classic deep learning models and standard neural architecture benchmarks. A comprehensive evaluation reveals the advantages of the proposed mapping strategies that bridge the neural architecture space and graph space. After determining the optimal neural architecture, optimization plays a crucial role. The optimization is non-trivial due to the non-convex nature and high dimensionality, locally minimum in the loss landscape commonly prevents the model from achieving the optimal performance. We explore the parameter-efficient fine-tuning approaches to effectively and efficiently improve the performance of foundational models in downstream tasks. Specifically, we propose heterogeneous prompting strategies that automatically determine the soft prompting configuration. The proposed methods are evaluated using foundational vision-language models. The performance on the image classification task reveals the advantage of the proposed methods. In addition to improving the performance of foundational models, we extend the applications of foundational models to new domains. Specifically, we propose a customized embedding strategy to apply large language models to the tabular data domain and utilize the intermediate presentation of foundational models to detect AI-generated images. The results reveal the under-explored potential of foundational models.

Date
Location
https://rensselaer.webex.com/rensselaer/j.php?MTID=mc82e5d5e37b65cac02faec4acfe6f507
Back to top