Nomenclature

Most log files are inside the $WORKhwrf, $COMhwrf, or $log directories. This section describes the nomenclature so you'll understand those and other shell-like variable names in this page.

Here are the variable names you may need to refer to:

$HOMEhwrf — HWRF installation directory
$WORKhwrf — the directory in which each HWRF storm runs. There is one of these per cycle, per storm.
$intercom=$WORKhwrf/intercom — a directory for trading data between jobs within one storm and cycle.
$COMhwrf — the output directory for each cycle. There may be one of these per storm, or all storms may share one.
$log — log files that are not specific to a storm or cycle
$job — the name of the job (post, forecast, products, etc.)
$jobid — the job ID assigned by the batch system, or passed down to the scripts by ecFlow (NCO-specific)
$YMD, $YMDH, $HH — components of the forecast cycle. For September 6, 2016, 00:00 UTC:
- $YMD = 20160906
- $YMDH = 2016090600
- $HH = 00
$STID — three-character storm id, such as 12L or 31W

These are specific to the NCO system (operational HWRF on WCOSS):

$envir — NCO-specific variable: prod, para or test for the production, parallel or test version of HWRF.
$stormnum — NCO-specific variable: a number from 1 to 7, for the storm priority.

These are specific to the non-NCO HWRF workflow:

$CDSCRUB — scrub area for the non-NCO HWRF
$EXPT, $SUBEXPT — experiment name, and sub-experiment name, used for directory paths in the non-NCO HWRF

S

Log Location Quick Reference

A quick reference for the current operational HWRF locations:

$WORKhwrf=/tmpprd_p2/hwrf$stormnum_$envir_$HH/
$COMhwrf=/com2/hur/$envir/hwrf.$YMDH/
$log=/com2/output/$envir/$YMD/
$envir=prod
per-job logs: $log/hwrf$stormnum_$job.o$jobid
jlogfile: None?

Default repository (non-operational) HWRF locations:

$WORKhwrf=$CDSCRUB/$SUBEXPT/$YMDH/$STID/
$COMhwrf=$CDSCRUB/$SUBEXPT/com/$YMDH/$STID/
$log=$CDSCRUB/$SUBEXPT/log/
per job logs: $WORKhwrf/hwrf_$job.{out,err}
jlogfile=$CDSCRUB/$SUBEXPT/log/jlogfile

Log locations within those directories:

Forecast logs:
- WRF = $WORKhwrf/runwrf/rsl.out.0000
- ocean+coupler = $WORKhwrf/cpl.out
Products job per-subprocess log files:
- tracker stdout: $WORKhwrf/$jobid-tracker.log
- tracker stderr: in products job stderr
- gribber stdout: $WORKhwrf/$jobid-gribber[1-7].out
- gribber stderr: $WORKhwrf/$jobid-gribber[1-7].err
- copier stdout: $WORKhwrf/$jobid-copier[1-7].out
- copier stderr: $WORKhwrf/$jobid-copier[1-7].out
Failed post stdout files: $WORKhwrf/post.* /vpost.log
Initialization log files:
- GFS: $WORKhwrf/gfsinit/
- FGAT (GDAS): $WORKhwrf/fgat.$YMDH00/

Log Files

The $jlogfile

The most useful tool for getting a quick glance at HWRF's status is the jlogfile. NCO configures the jlogfile location using the $jlogfile environment variable. For everyone else, it is in this location:

$CDSCRUB/$SUBEXPT/log/jlogfile

where $CDSCRUB is set in your system.conf file. The $SUBEXPT (sub-experiment) is user-defined, but defaults to the value of your $EXPT (also user defined). The jlogfile will contain log messages for all jobs run by that sub-experiment, for all storms and cycles. Only the highest-level messages are reported in the file.

Per-Job Log Files

Each HWRF batch job will generate lots of output to its stdout and stderr stream. Depending on which system you're using, they may be in a single output file or split into two. Generally, the stdout stream contains the most detailed information since it logging level is INFO while the stderr is at level WARNING. However, either one may contain error messages from executed commands or the operating system.

There are a few jobs in particular where the stdout vs stderr have special meaning:

coupled forecast — coupler logging is in stdout. Generally, we redirect this to a file due to its extreme size (>400 MB).
products — tracker's stderr is products' jobs stderr. This means the tracker messages about waiting for files to show up are in the products stderr.
relocate, merge — the Fortran programs these jobs run write extensively to both stdout and stderr. For this reason, the stderr stream has INFO logging level to make it easier to follow.

Forecast Log Files

As of this writing, there are three coupled components:

Coupler
Ocean
Atmosphere

The coupler and ocean share a log stream and both are redirected to:

$WORKhwrf/cpl.out

This special extra log file exists due to the extreme size of the coupler log: >400 MB.

The WRF has many log files, one for stdout and one for stderr for each of its MPI ranks. These are all in the runwrf/ directory:

$WORKhwrf/runwrf/rsl.out.RANK
$WORKhwrf/runwrf/rsl.error.RANK

where "RANK" is the rank within the WRF communicator, zero-padded to four digits. The first line of every log file will tell the name of the machine on which that rank is running. The WRF master process is the only one that does extensive logging, and its main log file is here:

$WORKhwrf/runwrf/rsl.out.0000

That file is updated multiple times per timestep with information about the WRF run.

However, if WRF fails, the problem could be in any rank. In that situation, it is critical to check the other rsl.* files, especially the rsl.error.* files.

Post-Processing and Regribbing

To understand post-processing logging, you have to know how the work of post-processing is divided. The work of the HWRF post-processing is split into a products job, and one or more post jobs. The post jobs run the Unified Post Processor (UPP), which converts WRF output to native E grid GRIB files. The products job regrids the E grid output to more standard grids and copies the resulting GRIB files to COM. The products job also runs the GFDL Vortex Tracker.

Depending on the system, either the post or products job will copy forecast job native outputs to COM, compressing any NetCDF files. Which job does it depends on how the NetCDF Operators were compiled. On NOAA Zeus, the post job copies files, while all other systems, the products job copies them.

Post Job Logging

Most required information from the post job is logged to its per-job stdout and stderr files. The stdout tells what is run when, and will list stack trace information for any failed post-processing operations. The stderr stream may have additional information from the post stderr. The stdout of the post itself is extremely long, so it is redirected to a file, and deleted if the post succeeds. If the post fails, the failed post stdout is found here:

$WORKhwrf/post.* /vpost.log

You can search the post job's stdout or stderr file to see which directory any failed post job ran in.

When the post job copies native model output to COM, most error messages are found in the stderr stream of the post job.

Products Job Logging

The products job is split into multiple subprocesses launched by the MPI program "mpiserial." They fall into three categories:

Gribbers — these perform regribbing operations on post output
Copiers — these copy native model output. Once all native model output is copied, they start regribbing instead.
Trackers — these run the GFDL Vortex Tracker on outputs from the Gribbers. There is always at least one tracker, but there may be more if the extra_trackers=yes option is set in the [config] section.

The output from gribbers is enormous, so it is redirected to files:

Gribber stdout: $WORKhwrf/$jobid-gribber[1-7].out
Gribber stderr: $WORKhwrf/$jobid-gribber[1-7].out

The stdout stream is best for finding out what the gribber is doing at any given time, while the stderr is best for finding errors. Most of the programs run by the gribber report errors in stderr. These include the cnvgrib, wgrib and hwrf_egrid2latlon (copygb) programs. Also, if you forget to install one of those programs, error messages about the program's absence will be in stderr.

The tracker logs can be found in two places:

Tracker stderr: main products job stderr
Main Tracker stdout: $WORKhwrf/$jobid-tracker.log
d02 Tracker stdout: $WORKhwrf/$jobid-d02tracker.log
d01 Tracker stdout: $WORKhwrf/$jobid-d01tracker.log

Most information from the tracker is in its stdout stream. The only stderr information of note is:

Messages about waiting for a grib or index file to appear.
System messages, such as being unable to start the tracker program.

Copier Log Files

Logging from the hwrf_expt.wrfcopier, which copies native model output to COM, can be in one of two places. In NCEP operations, and in most other platforms, it is in the products job, and can be found here:

Copier stdout: $WORKhwrf/$jobid-copier.out
Copier stderr: $WORKhwrf/$jobid-copier.err

On NOAA Zeus, the copier is run by the post, so its information is in the per-job log files for the post jobs.

In the products job, the stdout is a good way of tracking its progress, but stderr is better for finding error messages. Errors usually come from disk problems, or from failures of the ncks program. Typically, ncks errors will be cryptic messages from the NetCDF library or NetCDF Operators library, followed by a human-readable explanation that may span multiple lines. In any case, all errors should be readily visible as stack traces that include the hwrf.copywrf module.

Init and Bdy Job Log Files

The init and bdy jobs in HWRF take parent model data and prepare it for use by the rest of the HWRF system. That involves running many programs, all of which have extensive logging. Most of the progress and error information can be derived from the per-job log files. However, when something fails, it may be necessary to delve deeper.

There are two types of initialization: the "gfsinit" and the "fgat" initialization. The "gfsinit" directory is still used even if the parent model is not GFS (GDAS, FNL, etc.) These directories are:

$WORKhwrf/gfsinit — parent global model full-length forecast. By default, this is the GFS.
$intercom/gfsinit — intercom delivery location for that init
$WORKhwrf/fgat.$YMDH00 — parent global model short-length forecast for analysis. By default, this is the GDAS. There are usually three of these, for the HWRF analysis time -3, +0 and +3 hours.
$intercom/fgat.$YMDH00 — intercom delivery location for that init

Within each init directory, there are subdirectories for each component.

Geogrid
- stdout/stderr — wps/geogrid.log
- per rank — wps/geogrid.log.RANK
Ungrib
- stdout/stderr — wps/ungrib.log
Metgrid
- stdout/stderr — wps/metgrid.log
- per rank — wps/metgrid.log.RANK
prep_hybrid
- While running: $WORKhwrf/(init)/prep_hybrid/$YMDH/prep.log
- When finished: $intercom/(init)/prep_hybrid/prep_$piece.log where $piece is the boundary time index.
WRF and Real:
- init-length real_nmm — realinit/
- forecast-length real_nmm — realfcst/
- wrfanl run of the wrf — wrfanl/
- ghost run of the wrf — ghost/
For generating parent vortex location:
- post (while running or if failed) — post.*
- hwrf.gribtask to convert to lat-lon — regribber/
- tracker — tracker/

See the sections on post-processing or forecast for information about logging for the post, gribber, tracker, real, and WRF.

Errors from Metgrid or Ungrib indicate problems with input GRIB files. Either the files themselves are invalid, or the chosen Vtable is not appropriate for the inputs.

The geogrid and prep_hybrid should never fail unless the requested datasets are invalid (such as a bad or non-existent file, or an I/O error).

Python-Generated HWRF Log Files

In this section, we give guidance on how to read HWRF log files generated by the Python scripts. These log files follow a common structure. They each have job prologue and epilogue information, which consists of hostfile, environment and other diagnostics needed to debug jobs on failed compute nodes, or find other system information.

Eventually, after all the prologue, you will see something like this:

07/14 00:10:06.963 jlog (exhwrf_post.py:15) INFO: starting post
07/14 00:10:07.736 hwrf (launcher.py:152) INFO: Running cycle: 2015071318
07/14 00:10:07.736 hwrf (launcher.py:157) INFO: /lfs3/projects/hwrf-vd/hurrun/pytmp/H215_ensemble3/00/2015071318/11W/tmpvit: read vitals for current cycle
07/14 00:10:07.764 hwrf (launcher.py:161) INFO: Current cycle vitals: JTWC 11W NANGKA    20150713 1800 221N 1366E 350 042 0952 1002 0407 48 012 0278 0278 0250 0222 D
07/14 00:10:07.765 hwrf (launcher.py:164) INFO: /lfs3/projects/hwrf-vd/hurrun/pytmp/H215_ensemble3/00/2015071318/11W/oldvit: read vitals for prior cycle
07/14 00:10:07.785 hwrf (launcher.py:168) INFO: Prior cycle vitals: JTWC 11W NANGKA    20150713 1200 214N 1369E 355 042 0944 1002 0398 54 018 0278 0278 0250 0222 D
07/14 00:10:07.785 hwrf (hwrf_expt.py:127) INFO: Initializing hwrf_expt module...
07/14 00:10:14.387 hwrf (hwrf_expt.py:401) INFO: Done in hwrf_expt module.

Each line has a specific form:

07/14 00:10:06.963 — the date and time of the log message
jlog — the log stream
(exhwrf_post.py:15) — the file and line number that generated the message
INFO — the log level

There are many log streams, and their name can aid you in finding out where the logging is coming from, and hence why things are happening. For example:

07/14 00:10:14.532 hwrf.nonsatpost-f00h00m (fileop.py:435) INFO: hires_micro_lookup.dat: move from ...

(Abbreviated for readability.) Here, the hwrf.nonsatpost-f00h00m means the log message is about the analysis time (f00h00m) non-satellite post job (nonsatpost).

Python Logging Levels

The Python standard library logging module has multiple levels of logging, which HWRF uses for different purposes, and sends to different places:

Level	stdout	stderr	jlogfile	Meaning
DEBUG	no	no	no	Debug messages only usable by developer.
INFO	yes	no	no	Regular status information.
WARNING	yes	yes	no	Information that may be useful in debugging failed jobs.
ERROR	yes	yes	yes	Errors that will degrade forecast or disable components.
CRITICAL	yes	yes	yes	Failures that require operator intervention.

Note that higher levels of logging go to more streams. Log messages from all log streams at level ERROR or higher will go to the jlogfile. Other messages go to only the per-job output files.

Log messages sent to the special "jlog" stream also go to the jlogfile, even if they're at lower log levels. This is to allow each job to send start and completion messages without the messages looking like errors.

Table of Contents