Brief Impact

Description

This deliverable details the experimentation with different parameter-efficient fine-tuning (PEFT) methods applied to DistilBERT for legal case classification. The focus was on comparing fine-tuning techniques like LoRA, DoRA, and QLoRA, exploring various PEFT classes, and implementing explainable AI tools to enhance model interpretability.

Steps Followed

Step 1: PEFT Methodology Overview

LoRA (Low-Rank Adaptation): A technique that introduces low-rank update matrices, reducing the number of trainable parameters and thus enhancing efficiency. LoRA's flexibility allows for tailored model adaptation to domain-specific data.
DoRA (Distributed Representation Adaptation): Similar to LoRA, but distributes the update matrices across layers, potentially enhancing performance when handling complex text representations.
QLoRA (Quantized LoRA): Combines LoRA's parameter efficiency with model quantization to further reduce memory footprint, enabling fine-tuning on larger datasets with minimal computational cost.

Below is an image showing the difference between these fine-tuning techniques

Step 2: Fine-Tuning Experiments with LoRA, DoRA, and QLoRA

Using the PEFT type classes defined above, LoRA, DoRA, and QLoRA were applied across different model components. This experimentation allowed us to observe how each method affected performance, generalization, and computational efficiency.

Below is the image showing the accuracy result of 10 epoch using LoRA:

Below is the image showing the accuracy result of 10 epoch using DoRA:

Below is the image showing the accuracy result of 10 epoch using QLoRA:

Step 3: Fine-Tuning Experiments with LoRA, DoRA, and QLoRA using updated evaluation metrics

To evaluate model performance comprehensively, the following metrics were used:

Accuracy: Measures the percentage of correctly classified cases out of the total cases, providing an overall view of the model's performance.
Precision: Indicates how many of the cases predicted as positive (e.g., a specific legal outcome) were actually positive. High precision is essential in legal contexts to avoid false positives.
Recall: Measures the model's ability to identify all relevant cases for a given category, showing how many actual positive cases were correctly predicted. High recall is valuable to minimize false negatives, ensuring critical cases aren't missed.
F1 Score: Provides a harmonic mean of precision and recall, balancing the two to give a single metric that accounts for both false positives and false negatives. It's especially useful in scenarios where an even trade-off between precision and recall is crucial.

These metrics together allowed for a more nuanced evaluation of LoRA, DoRA, and QLoRA, highlighting not only overall accuracy but also the reliability of each model's predictions in terms of specificity and coverage.

Below is the image showing the result of 10 epoch using LoRA with the updated evaluation metrics:

Below is the image showing the result of 10 epoch using DoRA with the updated evaluation metrics:

Below is the image showing the result of 10 epoch using QLoRA with the updated evaluation metrics:

Step 4: Why LoRA Was Best Suited for Legal Case Prediction

Through detailed experimentation, LoRA emerged as the most effective PEFT technique for legal case prediction based on the following factors:

Parameter Efficiency: LoRA's low-rank matrix updates allow significant parameter reduction while retaining the model's ability to learn case-specific information. This efficiency is particularly beneficial when fine-tuning large models with limited computing resources.
Balanced Performance: LoRA consistently achieved high accuracy and F1 scores compared to DoRA and QLoRA, excelling in both precision and recall. This balance is essential in legal contexts where both false positives and false negatives must be minimized.
Computational Efficiency: LoRA's design enabled faster training times with minimal resource use, making it practical for legal datasets, which can be extensive. In contrast, QLoRA required more extensive quantization adjustments, and DoRA's distributed adaptation led to increased memory usage without significantly higher accuracy.
Explainability Integration: The explainable AI tools (SHAP and LIME) integrated well with LoRA's output, providing interpretable insights into decision-making. LoRA's straightforward adaptation structure made it easier to identify influential factors in verdict predictions, critical for building trust in legal AI systems.

Based on these advantages, LoRA was selected as the primary technique for further fine-tuning and evaluation in legal case classification tasks.

Step 5: Exploring Different Layers Of Embeddings

In this step, I selected specific layers (0, 2, 4, and 6) in DistilBERT to apply LoRA transformations using the layers_to_transform parameter. By focusing only on these layers, the model can achieve a balance of targeted domain adaptation and computational efficiency, as not all layers undergo fine-tuning.

Impact of Configuring `layers_to_transform`

By restricting LoRA transformations to specific layers:

Performance Decrease: Accuracy, F1, and other metrics may drop since embedding adjustments are limited, and the model might miss out on fully capturing nuanced patterns.
Efficiency Gain: Memory and computational resources are conserved as fewer parameters are adjusted.

In contrast, the default LoRA setting-without specifying layers_to_transform tends to produce higher-quality embeddings as all layers undergo adaptation, resulting in a more robust model output, especially for complex tasks like legal case prediction.

Below is the image showing the result of 10 epoch using LoRA with no extra layers of embeddings:

Below is the image showing the result of 10 epoch using LoRA with extra layers of embeddings:

Step 6: Overview of PEFT Task Types

Parameter-efficient fine-tuning (PEFT) optimizes large language models (LLMs) for a variety of specific tasks. The following task types benefit from PEFT methodologies:

Sequence Classification: This task involves categorizing entire sequences of text into predefined classes. PEFT enhances model efficiency while retaining performance in tasks such as sentiment analysis and topic classification.
Token Classification: Token classification focuses on labeling individual tokens within a sequence. This is commonly used in named entity recognition (NER) and part-of-speech tagging. PEFT ensures that the model is adept at discerning nuances in token-level tasks.
Question Answering (QA): QA tasks require models to provide answers to questions based on context. PEFT optimizes LLMs for understanding context and generating accurate responses.
Text Generation: In this task, models are trained to produce coherent and contextually relevant text based on input prompts. PEFT helps in fine-tuning generative models for improved creative writing and content generation.
Text-to-Text Tasks: These tasks involve converting one piece of text into another, such as summarization, translation, and paraphrasing. PEFT allows models to excel in transforming text while maintaining the original meaning.

After researching all these task types, I found that sequence classification is the best suited for our use case, as it effectively aligns with our goals of accurately categorizing legal cases based on predefined classes.

Step 7: Implementing Explainable AI with LoRA

Why Explainable AI (XAI)? To interpret the model's decision-making process, I integrated XAI tools such as SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-Agnostic Explanations). These tools helped highlight which case details most influenced the model's verdict predictions, enhancing transparency and trustworthiness in legal contexts.

Steps to Implement SHAP for Explainable AI with LoRA

Install SHAP: Ensure that the SHAP library is installed in your environment. You can install it via pip:

pip install shap

Prepare Your Model: Fine-tune your model with LoRA as previously described, ensuring it is ready for inference.
Generate Predictions: Use your fine-tuned model to make predictions on the test dataset.
Initialize SHAP Explainer: Create a SHAP explainer instance, passing the model and the input data:

explainer = shap.Explainer(model)

Compute SHAP Values: Calculate the SHAP values for the predictions:

shap_values = explainer(test_data)

Visualize SHAP Values: Use SHAP's visualization tools to plot the SHAP values and interpret the results:

shap.summary_plot(shap_values, test_data)

Attention Maps for Visualization: To further interpret the model, generate attention maps:

Extract attention weights from the model.
Visualize the attention maps to see which input features are most relevant to the model's decisions.

Below is the image showing the result of integrating ExAI using LoRA:

Code Snippet:

1. Code snippet for using multi-faceted evaluation metrics:

2. Code snippet for fine-tuning with LoRA:

3. Code snippet for fine-tuning with DoRA:

4. Code snippet for fine-tuning with QLoRA:

5. Code snippet for applying embeddings to multi layers with LoRA:

6. Code snippet for explainable AI integration:

Model Evaluation

After experimenting with each PEFT technique, LoRA achieved the highest accuracy, peaking at 68.4%, while DoRA and QLoRA provided computational efficiency gains without significant loss in accuracy. The explainable AI integration revealed that factual accuracy was a primary driver in the model's verdict predictions.

Insights Gained

This experimentation reinforced the value of PEFT techniques in optimizing model performance for domain-specific tasks. Explainable AI provided essential insights into the model's reasoning process, proving invaluable for ensuring fair and transparent legal predictions.