Semantics and provenance of configuration programming language μPuppet
Abstract
Nowadays computing infrastructures have grown bigger in scale and more complex.
Automated configuration management tools have taken the place of traditional
approaches of configuration tasks, such as manual configuration and writing
simple scripts, which become tedious, inefficient and error-prone to human mistakes.
These tools typically contain their own specific configuration languages. In
general, users write configuration specifications of the system and the tools compile
them to configuration files that are deployed on each machine to change its
configuration. Configuration specifications tend to be large and involve thousands
of parameters and the relations between them. However automating configuration
management does not prevent configuration errors that might cause system
failures that are costly and time-consuming to remedy. Configuration languages
normally support separation of concerns of different users, but there is limited
work on access control and work flow to guarantee the correctness of the configurations
when there are changes in the specifications, or the analysis tools for root
causes for configuration errors. Configuration errors remains one of the dominant
reasons for system failures (Oppenheimer et al., 2003).
In this thesis, we argue that provenance techniques, a dynamic technique of
tracking data history developed in the database field, is a natural solution to providing
configuration management tools with the ability of analysing configuration
errors and identifying their root causes in the configuration specifications. For
this purpose, we first choose a popular configuration management tool Puppet
and present μPuppet which models the operational semantics of a core subset
of Puppet language. In addition, we prototype an interpreter and a parser for
μPuppet and compare their validation results by using more 50 Puppet manifests
against the real Puppet language. Based on the formal semantics of μPuppet, we
formalise where-, expression- and dependency-provenance which track the original
inputs of an output data value, the primitive operation procedure that derives
an output data value and all the inputs on which a generated output depended
respectively in the process of compiling the configuration specifications. Further
more, we establish the static correctness properties of three forms of provenance.
We prove that where- and expression-provenance satisfy these properties. We
find the limitation of dependency-provenance we define that does not record the
full dependency information for some unusual value-overriding semantics, which
reveals the complexity of the Puppet language. We prove the partial correctness
of dependency-provenance confined to a subset of μPuppet that excludes the
unusual cases and propose possible solutions to deal with these unusual cases.