HWRF  trunk@4391
HWRF System Overview

Introduction

This page seeks to give a high-level overview of the design of the Python-based HWRF scripting system, known as the pyHWRF. We discuss the overall structure of the system, and provide links to other parts of this documentation for more details on each individual layer.

Purpose

Prior to this rewrite, there were many different scripting systems to run the HWRF model, which led to wasted manpower maintaining parallel versions and tracking down reasons for unexpected differences in the forecast results. Part of the reason for that is that all of the scripting systems were written with monolithic structures, with designs that made it difficult to adapt to new batch systems, workflow management systems, operating systems, or HWRF configurations. Hence, every organization needed its own scripts to run the HWRF, and some organizations needed several scripting systems. The pyHWRF project aimed to solve this problem, by creating an object-oriented, layered scripting system that minimizes the amount of work needed to modify it.

Overall System Design

The pyHWRF system was divided into several layers, where each layer never interacts with layers above it. This is done to simplify both adding new modeling or post-processing functionality, and porting to other platforms. By completely separating portability issues, setting paths, and modeling aspects of the implementation into independent layers, we avoid having to modify all aspects of the functionality when modifying only one. A diagram of this "division of labor through layering" is below:

Todo:
insert layer diagram here

The remainder of this page documents these layers at an abstract level, and provides links to additional information.

High-Level Layers

These high-level layers handle various levels of automation, which are not always needed depending on the application. If a user is only running one cycle of one storm, or is simply debugging a single job, then automation is not necessarily needed. However, if running multiple cycles, or multiple storms, automation is critical.

Intercycle Layer

The HWRF system, like most forecasting systems, takes input for a particular analysis time, and then provides a forecast for what the atmosphere and ocean will do for some amount of time afterwards. Each set of jobs run for one analysis time is referred to as a cycle. The job of the Intercycle layer is to handle interactions between several cycles, which generally involves complex dependencies such as files transferred between them, archiving to tape, ensuring output is scrubbed so there is always disk space available, and so on. This is the job of the Intercycle layer. In the case of NCEP Central Operations (NCO), which runs the operational HWRF, the Intercycle layer is a human operator overseeing some automation scripts. For the EMC parallels, it is the HHS automation system. For DTC and EMC, it is the Rocoto automation system. For a case study or when simply debugging, this layer is not needed at all, but can sometimes be convenient. For any large retrospective test of many storms, or for an automated real-time forecast of many cycles, this layer is critical.

Workflow Layer

When running the HWRF in an automated manner, one must split the work into multiple batch jobs, which run part of the HWRF system on some supercomputer compute nodes. Certain pieces of the HWRF system cannot start until certain others finish. For example, the forecast job requires the initial and boundary conditions. The Workflow layer handles such dependencies for you, automatically. Strictly speaking, this layer is not needed: one can run the pyHWRF in an interactive batch job, and the developers of this system test in that manner. However, it is generally easier and less error-prone to use a workflow layer of some sort.

Intercycle and Workflow Layer Documentation

Scripting Layer

To run any scripts on a supercomputer, one generally has to load certain programs or libraries into one's environment, and ensure that the filesystems required by the scripts are available on the compute node in use. In addition, when connecting HWRF to a workflow layer (see above), some additional work is needed to pass information such as file and executable locations, to the next lower layer, the Experiment Layer (see below). This is the job of the Scripting Layer. This layer is optional: it can be done manually by the user, and the system has been tested in that way, but is laborious and only useful for debugging.

Experiment Layer

The HWRF Experiment Layer describes the HWRF workflow. It creates a Python object structure (see hwrf_expt) that connects each piece to each other piece, and provides a means to run it. A sufficiently motivated user can run the entire HWRF system through an interactive Python session in a large batch job. However, it is not feasible to do so for a large-scale test.

Implementation Layer

The HWRF Implementation Layer is a set of Python classes and functions used by the Experiment Layer to run the HWRF. The Python classes each know how to run one part of the HWRF system, such as the ocean initialization, the forecast, or the post-processing. Some of them are utility classes to do such things as predicting WRF output filenames or performing time and date arithmetic.

Portability Layer

The Portability Layer is a Python package (see produtil) which implements cross-platform methods of doing common tasks. For example, it implements a way of running MPI, OpenMP and serial programs in a cross-platform manner. It can perform file operations with improved logging, interact with the batch system, identify limitations of the cluster, deal with restricted data classes, manipulate resource limits, and interact with a database file.