Loughborough University
Browse
main.pdf (611.49 kB)

Towards more efficient methods for solving regular-expression heavy string constraints

Download (611.49 kB)
journal contribution
posted on 2023-01-13, 09:15 authored by Murphy Berzish, Joel DayJoel Day, Vijay Ganesh, Mitja Kulczynski, Florin Manea, Federico Mora, Dirk Nowotka
Widespread use of string solvers in the formal analysis of string-heavy programs has led to a growing demand for more efficient and reliable techniques which can be applied in this context. Designing practical algorithms for the (generally undecidable) satisfiability problem for systems of string constraints requires a deep understanding of the structure of constraints present in the targeted cases. In this paper, we first investigate relevant benchmarks containing regular expression membership predicates, extract a series of first order logic theories, and prove the decidability, respectively undecidability, of their satisfiability problem. Further, building on the theoretical results, we present a novel length-aware solving algorithm for the quantifier-free first-order theory over regular expression membership predicates and linear arithmetic over string length. We implement and evaluate this algorithm and related heuristics in the Z3 theorem prover. Besides the theoretical aspects leading to this algorithm, a crucial insight that underpins it is that real-world regex and string formulas contain a wealth of information about upper and lower bounds on lengths of strings, and such information can be used very effectively to simplify operations on automata representing regular expressions. Additionally, we present several novel general heuristics, such as the prefix/suffix method, that can be used to make a variety of regex solving algorithms more efficient in practice. We showcase the power of our algorithm and heuristics via an extensive empirical evaluation over a large and diverse benchmark of 57256 regex-heavy instances, almost 75% of which are derived from industrial applications or contributed by other solver developers. Our solver outperforms five other state-of-the-art string solvers over this benchmark.

History

School

  • Science

Department

  • Computer Science

Published in

Theoretical Computer Science

Volume

943

Pages

50 - 72

Publisher

Elsevier

Version

  • AM (Accepted Manuscript)

Rights holder

© Elsevier

Publisher statement

This paper was accepted for publication in the journal Theoretical Computer Science and the definitive published version is available at https://doi.org/10.1016/j.tcs.2022.12.009

Acceptance date

2022-12-04

Publication date

2022-12-09

Copyright date

2022

ISSN

0304-3975

Language

  • en

Depositor

Dr Joel Day. Deposit date: 13 January 2023

Usage metrics

    Loughborough Publications

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC