( Print View) [Bio] [Blog] [Deliverable 1 - MATH Dataset] [Deliverable 1 - GSM8k Dataset] [Deliverable_2 - Integrate mathics tool] |
## CS297 Proposal## Improving Mathematical and Logical Capabilities of LLMsRohan Kumar (nagarohankumar.bayya@sjsu.edu)
LLMs struggle with math because they are trained primarily on natural language data, lacking the formal reasoning and symbolic manipulation needed for precise calculations. They often prioritize linguistic patterns over strict correctness, leading to plausible but inaccurate answers. Their limited context window and inability to perform multi-step reasoning further hinder accuracy. This project aims to enhance LLMs ability to handle mathematical problems and logical reasoning tasks by utilizing curated datasets, hybrid models incorporating symbolic reasoning elements, and Process supervision. The goal is to improve the model's performance on complex math and logic problems.
The full project will be done when CS298 is completed. The following will be done by the end of CS297: 1. Start fine-tuning LLMs using MATH and GSM8k datasets 2. Integrate chatgpt with mathics to compute integral of sin(x) 3. Lookup LEAN and prove for every prime, there exists a bigger prime. 4. Find word problems dataset XYZ and fine-tune an LLM to answer a math problem 5. Submit CS297 Report
[Hendrycks2021] Measuring Mathematical Problem Solving with the MATH Dataset. Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt. arXiv preprint arXiv:2103.03874. 2021. [Peng2022] PAL: Program-Aided Language Models. Ronald S. Peng, Karl Cobbe, Jacob Hilton, John Schulman. arXiv preprint arXiv:2205.11916. 2022. [Leonardo2015] Leonardo de Moura, Soonho Kong, Jeremy Avigad, Floris Van Doorn, and Jakob von Raumer. The Lean theorem prover (system description). In International Conference on Automated Deduction (CADE), 2015. 1, 2, 22 [Polu2020] Learning to Prove Theorems via Interacting with Proof Assistants. Stanislas Polu, Ilya Sutskever. arXiv preprint arXiv:2009.03393. 2020. [Wei2022] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed H. Chi, Quoc V. Le, Denny Zhou. arXiv preprint arXiv:2201.11903. 2022. |