Chris Pollett > Students >
Rohan

    ( Print View)

    [Bio]

    [Blog]

    [C297 Proposal]

    [LoRA: Low Rank Adaptation of Matrix (.pdf)]

    [DoRA: Weight Decomposed Low-rank Adaptation (.pdf)]

    [Deliverable 1 - MATH Dataset]

    [Deliverable 1 - GSM8k Dataset]

    [Deliverable_2 - Integrate mathics tool]

    [Deliverable_3 - Prove Infinite primes theorm using LEAN]

    [Chain-of-Thought Prompting in LLMs(.pdf)]

    [LeanDojo - Theorem Proving with RAG(.pdf)]

    [Deliverable 4 - Solving word problems using LEAN and Mathics]

    [CS297 Report(.pdf)]

    [C298 Proposal]

Description:

LLMs struggle with math because they are trained primarily on natural language data, lacking the formal reasoning and symbolic manipulation needed for precise calculations. They often prioritize linguistic patterns over strict correctness, leading to plausible but inaccurate answers. Their limited context window and inability to perform multi-step reasoning further hinder accuracy. This project aims to enhance LLMs ability to handle mathematical problems and logical reasoning tasks by utilizing curated datasets, hybrid models incorporating symbolic reasoning elements, and Process supervision. The goal is to improve the model's performance on complex math and logic problems.

Schedule:

Week Activities
Week 1: September 2, 2024 Review current limitations of LLMs in math and logic
Week 2: September 9, 2024 Identify set of LLMs. Read on how to train and deploy them
Week 3: September 16, 2024 Start fine-tuning pre-trained models on math and logic datasets (MATH and GSM8k). Read [Hendrycks2021]
Week 4: September 23, 2024 Prepare evaluation results on initial fine-tuned models
Week 5: September 30, 2024 Start on deliverable 2: Understand working of Mathics
Week 6: October 7, 2024 Continue deliverable 2: Read [Peng2022]. Invoke Mathics through LLM's output prompt
Week 7: October 14, 2024 Complete deliverable 2: Document results upon performing basic math calculations
Week 8: October 21, 2024 Start deliverable 3: Read Documentation of Mathics. Read [Leonardo2015], [Polu2020]
Week 9: October 28, 2024 Continue on deliverable 3: Integrate LLM with LEAN
Week 10: November 4, 2024 Complete deliverable 3: Prove basic theorems using LLM+LEAN
Week 11: November 11, 2024 Start deliverable 4: Read about process training and CoT prompting
Week 12: November 18, 2024 Continue deliverable 4: Find word problems datasets or create a synthetic one
Week 13: November 25, 2024 Continue deliverable 4: Read [Wei2022]. Continue evaluating our model on dataset
Week 14: December 2, 2024 Complete deliverable 4: Record accuracy metrics
Week 15: December 9, 2024 Start deliverable 5: Compile all the deliverables
Week 16: December 16, 2024 Complete CS297 Report

Deliverables:

  1. Start fine-tuning LLMs using MATH and GSM8k datasets
  2. Integrate ChatGPT with Mathics to compute the integral of sin(x)
  3. Lookup LEAN and prove for every prime, there exists a bigger prime.
  4. Find word problems dataset XYZ and fine-tune an LLM to answer a math problem
  5. Submit CS297 Report

References:

  • [Hendrycks2021] Measuring Mathematical Problem Solving with the MATH Dataset. Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt. arXiv preprint arXiv:2103.03874. 2021.
  • [Peng2022] PAL: Program-Aided Language Models. Ronald S. Peng, Karl Cobbe, Jacob Hilton, John Schulman. arXiv preprint arXiv:2205.11916. 2022.
  • [Leonardo2015] Leonardo de Moura, Soonho Kong, Jeremy Avigad, Floris Van Doorn, and Jakob von Raumer. The Lean theorem prover (system description). In International Conference on Automated Deduction (CADE), 2015. 1, 2, 22
  • [Polu2020] Learning to Prove Theorems via Interacting with Proof Assistants. Stanislas Polu, Ilya Sutskever. arXiv preprint arXiv:2009.03393. 2020.
  • [Wei2022] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed H. Chi, Quoc V. Le, Denny Zhou. arXiv preprint arXiv:2201.11903. 2022.