Chris Pollett >
Students >
Rohan
( Print View)
[Bio]
[Blog]
[C297 Proposal]
[LoRA: Low Rank Adaptation of Matrix (.pdf)]
[DoRA: Weight Decomposed Low-rank Adaptation (.pdf)]
[Deliverable 1 - MATH Dataset]
[Deliverable 1 - GSM8k Dataset]
[Deliverable_2 - Integrate mathics tool]
[Deliverable_3 - Prove Infinite primes theorm using LEAN]
[Chain-of-Thought Prompting in LLMs(.pdf)]
[LeanDojo - Theorem Proving with RAG(.pdf)]
[Deliverable 4 - Solving word problems using LEAN and Mathics]
[CS297 Report(.pdf)]
[C298 Proposal]
|
Description:
LLMs struggle with math because they are trained primarily on natural language data, lacking the formal reasoning and
symbolic manipulation needed for precise calculations. They often prioritize linguistic patterns over strict correctness, leading to
plausible but inaccurate answers. Their limited context window and inability to perform multi-step reasoning further hinder accuracy.
This project aims to enhance LLMs ability to handle mathematical problems and logical reasoning tasks by utilizing curated datasets,
hybrid models incorporating symbolic reasoning elements, and Process supervision. The goal is to improve the model's performance on complex math and logic problems.
Schedule:
Week |
Activities |
Week 1: September 2, 2024 |
Review current limitations of LLMs in math and logic |
Week 2: September 9, 2024 |
Identify set of LLMs. Read on how to train and deploy them |
Week 3: September 16, 2024 |
Start fine-tuning pre-trained models on math and logic datasets (MATH and GSM8k). Read [Hendrycks2021] |
Week 4: September 23, 2024 |
Prepare evaluation results on initial fine-tuned models |
Week 5: September 30, 2024 |
Start on deliverable 2: Understand working of Mathics |
Week 6: October 7, 2024 |
Continue deliverable 2: Read [Peng2022]. Invoke Mathics through LLM's output prompt |
Week 7: October 14, 2024 |
Complete deliverable 2: Document results upon performing basic math calculations |
Week 8: October 21, 2024 |
Start deliverable 3: Read Documentation of Mathics. Read [Leonardo2015], [Polu2020] |
Week 9: October 28, 2024 |
Continue on deliverable 3: Integrate LLM with LEAN |
Week 10: November 4, 2024 |
Complete deliverable 3: Prove basic theorems using LLM+LEAN |
Week 11: November 11, 2024 |
Start deliverable 4: Read about process training and CoT prompting |
Week 12: November 18, 2024 |
Continue deliverable 4: Find word problems datasets or create a synthetic one |
Week 13: November 25, 2024 |
Continue deliverable 4: Read [Wei2022]. Continue evaluating our model on dataset |
Week 14: December 2, 2024 |
Complete deliverable 4: Record accuracy metrics |
Week 15: December 9, 2024 |
Start deliverable 5: Compile all the deliverables |
Week 16: December 16, 2024 |
Complete CS297 Report |
Deliverables:
- Start fine-tuning LLMs using MATH and GSM8k datasets
- Integrate ChatGPT with Mathics to compute the integral of sin(x)
- Lookup LEAN and prove for every prime, there exists a bigger prime.
- Find word problems dataset XYZ and fine-tune an LLM to answer a math problem
- Submit CS297 Report
References:
- [Hendrycks2021] Measuring Mathematical Problem Solving with the MATH Dataset. Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt. arXiv preprint arXiv:2103.03874. 2021.
- [Peng2022] PAL: Program-Aided Language Models. Ronald S. Peng, Karl Cobbe, Jacob Hilton, John Schulman. arXiv preprint arXiv:2205.11916. 2022.
- [Leonardo2015] Leonardo de Moura, Soonho Kong, Jeremy Avigad, Floris Van Doorn, and Jakob von Raumer. The Lean theorem prover (system description). In International Conference on Automated Deduction (CADE), 2015. 1, 2, 22
- [Polu2020] Learning to Prove Theorems via Interacting with Proof Assistants. Stanislas Polu, Ilya Sutskever. arXiv preprint arXiv:2009.03393. 2020.
- [Wei2022] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed H. Chi, Quoc V. Le, Denny Zhou. arXiv preprint arXiv:2201.11903. 2022.
|