SynPAT: A System for Generating Synthetic Physical Theories with Data

Abstract

Machine-assisted methods for discovering new physical laws of nature, starting from a given background theory and data, have recently emerged, and seem to hold the promise of someday advancing our understanding of the physical world. To address these needs, we have developed SynPAT, a system for generating synthetic physical theories comprising (i) a set of consistent axioms, (ii) a symbolic expression that is a consequence of the axioms and the challenge to be discovered, and (iii) noisy data that approximately match the consequence. We also generate theories that do not correctly predict the consequence. We give a detailed description of the inner workings of SynPAT and its various capabilities. We also report on our benchmarking of several open-source symbolic regression systems using our generated theories and data.

Publication
Preprint

There have been many works on symbolic regression (finding a formula of best-fit to a dataset without a predetermined form as in linear regression, logistic regression, etc) in context of discovering new physical laws (1, 2, 3, 4, 5). While these systems all generate formulae that fit data in various contexts, recent breakthroughs have shown that building into the model a framework for exploiting known background theory (encoded as physical axioms) can greatly improve machine-assisted discovery in the scientific context (1, 2). These new systems demonstrate an important new direction for machine-assisted discovery in science to more directly account for background theory in the search in addition to building in heuristics into a model. With this new direction comes also a new need for benchmark datasets to understand the performance of machine-assisted discovery models moving forward. In this work, we present SynPat, a method for generating dimensionally consistent synthetic physical theories which contain i. a list of axioms (encoded as polynomials and ordinary differential equations), ii. a consequence polynomial / ODE of the axioms that is to be discoverred, iii. numeric datasets for the both the axiom systems and consequence phenomena, iv. alternate incorrect axiom systems to test these methods for situations where we do not have complete or correct theory.

You can find relevant code and dataset here. You can also find the dataset on huggingface

Karan Srivastava
Karan Srivastava
PhD Student, Mathematics

My research interests include machine learning, reinforcement learning, combinatorics, and algebraic geometry

Related