Keywords: Ensembles; Forecast verification/skill; Operational forecasting; Model comparison; Model evaluation/performance
1. Introduction
During the period from 1200 UTC (0600 mountain standard time) 9 September to 1200 UTC 16 September 2013, gauge measurements showed that more than 17 in. (~432 mm) of rainfall fell in several areas of the Front Range of northern Colorado, with the precipitation maximum nearly directly over the city of Boulder, Colorado. A large area of >8 in. (~203 mm) accumulated precipitation extended across a wide swath of the Front Range. The peak precipitation periods were the evenings of 11 and 12 September, though heavy rainfall also occurred on 9 and 15 September. Figure 1 provides a map of the analyzed precipitation over Colorado and New Mexico, with additional panels showing the day-by-day accumulations in the northern Front Range and Denver metropolitan area. There were several areas with very heavy precipitation, with especially heavy rainfall also occurring in Aurora, Colorado, just east of Denver, another small area of very heavy precipitation southwest of Colorado Springs, and extensive heavy rainfall in central and southern New Mexico. Synoptically, for the northern Colorado Front Range, this period was notable for its record total-column precipitable water (Fig. 2), associated near moist-adiabatic vertical temperature and humidity profiles, and lower- to midtropospheric upslope geostrophic flow (see appendix A in the online supplemental material).
The largest impacts during this extended storm were in the northern Front Range and later along river basins to the east. According to the Federal Emergency Management Agency, in their preliminary disaster declaration (through 30 November 2013), 1500 houses were destroyed and ~19 000 damaged. A total of 485 miles of roadway were damaged, including most roads into the mountains in the northern Front Range, making many homes impossible to reach, except on foot. A total of 30 state highway bridges were destroyed and 20 were severely damaged. In addition, 27 state dams sustained damage; 150 miles of railroad track were damaged. Nine people died as a result of the storms and flooding.
This article will analyze the performance of operational precipitation forecasts over the northern Front Range, especially Boulder County, though the maps herein will allow the reader to examine the performance of the models over larger regions. This article does not present new research; the purpose is simply to document the performance of the operational guidance available to forecasters at the time.1 Because the forecasts were, for the most part, unexceptional, it is likely that this event will become a focus of intense study in the months and years to come. This article was written to document the performance of the operational models, as these may become a useful baseline for future comparison. Previously, some characteristics of global ensemble-mean predictions for this storm were examined in Lavers and Villarini (2013), including the ability of global ensemble systems to predict the accumulated precipitation more faithfully over the state of Colorado than over a 0.5° box around Boulder.
2. Precipitation analysis data and the forecast models
Both “stage IV” and Advanced Hydrologic Prediction System (AHPS) precipitation analysis data were used in this study. Each provides data on ~4-km grids over the contiguous United States. Stage-IV data are available at hourly and 6-hourly intervals, though more quality control is applied to the 6-hourly data over this area of the nation. The AHPS precipitation analyses are provided only every 24 h and agglomerate the four 6-hourly stage-IV analyses. Generally, the procedure here was to use the most compact and highest-quality data available from these three sources whenever possible. Consequently, for accumulated precipitation forecast plots that span >24-h periods, AHPS data were used as much as possible, supplemented by stage-IV 6-hourly data, and hourly data only when necessary. One instance where the use of hourly data was necessary was in the creation of plots of forecast and analyzed time series of accumulated precipitation over multiday periods. In such cases, hourly stage-IV data were used, but the accumulated precipitation amounts over the multiday periods were scaled (generally upward) to be consistent with the amounts from the more quality-controlled 6-hourly stage-IV and 24-hourly AHPS data. [A description of the AHPS precipitation analyses are provided online at http://water.weather.gov/precip/about.php. Stage IV data are documented in Lin and Mitchell (2005) and online at http://www.emc.ncep.noaa.gov/mmb/ylin/pcpanl/stage4/.]
The following forecast modeling systems were examined in this study: medium-range global ensembles from the National Centers for Environmental Prediction (NCEP) Global Ensemble Forecast System (GEFS), the European Centre for Medium-Range Weather Forecasts (ECMWF), the Met Office, and the Canadian Meteorological Center (CMC). Regional ensembles were also examined from the NCEP Short-Range Ensemble Forecast (SREF) system. Deterministic forecasts were examined from the NCEP Global Forecast System (GFS), the NCEP regional North American Mesoscale Model (NAM), and the Rapid Refresh (RAP) model. A more extensive documentation of the model configurations is provided in Table 1 and in appendix A in the online supplemental material.
Table 1. Summary of the configuration of modeling systems used in this paper. Grid spacings for the regional models are those reported by NCEP. For the global models, the approximate grid spacing over Boulder, CO, is reported.
Native forecast model resolutions varied widely. However, for the global ensemble predictions, the forecast data for the CMC, ECMWF, and the NCEP GEFS systems beyond day +8 were not available on their native grid as obtained from The Observing System Research and Predictability Experiment (THORPEX) Interactive Grand Global Ensemble (TIGGE; Bougeault et al. 2010) archive. For these global ensembles, raw data were obtained from TIGGE at the highest possible resolution and then interpolated to a 0.2° grid before display or analysis. Some of the global models used native grid spacings that were relatively coarse for such a local event; for example, the Met Office used a 0.83° grid (71 × 92 km2 grid spacing near Boulder) with its ensemble prediction system. When interpreting the subsequent results, especially for comparisons over Boulder County, which will be a box of 0.5° × 0.6°, the reader should be aware that the forecast models with coarser resolution, even when perfect, cannot be expected to provide this level of detail and would somewhat underestimate the precipitation.
3. Results
A very abbreviated set of the most pertinent results are presented here; a much more complete set of forecast results spanning a large range of initialization times are presented in appendixes B (for global ensemble forecasts) and C (for shorter-range and deterministic forecasts) in the online supplemental material. Figure 3 shows time series of global ensemble forecasts of accumulated precipitation from the four global ensemble systems, in this plot for forecasts initialized 1200 UTC Monday, 8 September 2013, ~84 h before the onset of the heaviest precipitation. The three panels show the precipitation guidance approximately over Boulder County and then over progressively larger areas. These larger areas were included because theory and practice suggests that precipitation forecast skill should be larger over larger areas (Islam et al. 1993; Gallus 2002). Hence, we seek to determine whether precipitation forecast consistency with the analyzed data improves with increasing scale. Precipitation forecast accuracy was not evaluated objectively, for example, with threat scores or ranked probability skill scores. Such statistics are commonly only significant when evaluated over many dozens of independent events.
Figure 3 shows that over Boulder County, with the exception of CMC, the ensemble systems for this initialization time were generally predicting total accumulations in excess of 50 mm over Boulder County. None produced accumulated precipitation anywhere near the analyzed amount, which was ~250 mm, though some of this can be attributed to the coarser model grid spacing. For similar forecasts at other lead times (see appendix B in the online supplemental material), there were occasionally one or two members with total accumulations up to 60% of observed. The ensemble guidance produced greater precipitation amounts over Boulder County as the event got closer, but then for the several lead times just prior to the onset of heaviest precipitation, the ensembles again forecasted somewhat lighter precipitation amounts. This happened with all four models. At the intermediate scale in Fig. 3b, the ensemble predictions still underforecasted the rainfall accumulation, though the discrepancy between analyzed and forecast was lessened. Finally, Fig. 3c shows that the precipitation forecasts were even more consistent with the analyzed accumulation over the largest region, as suggested in the previous literature, including in Lavers and Villarini (2013) for this case.
Was the deficiency of precipitation noted in the forecast ensembles in Fig. 3a merely a consequence of the models’ coarse grid spacing? This can be examined in part by examining the spatial patterns of accumulated precipitation. Figure 4 maps the analyzed precipitation and the four global systems’ ensemble-mean forecasts. For ease of interpretation, Fig. 4d also shows a coarser ~1° smoothed precipitation analysis, more consistent with a resolution the forecast model can potentially predict. Figures 4b and 4e show that both the NCEP and ECMWF systems were forecasting a local maximum of precipitation near Boulder County, with the maximum in the NCEP system displaced slightly west of the analyzed position. The NCEP forecasts also underforecasted the precipitation through much of New Mexico. ECMWF predicted the heavier precipitation along the Front Range, consistent with the analyzed pattern but missed the extension of heavy precipitation to the southeast of Boulder and somewhat in eastern New Mexico. Figures 4c and 4f show that the Met Office forecast maximum in the northern Front Range was weaker and farther east, and the CMC forecast maximum at this time were much weaker and slightly farther east. Generally, across many initial times, ECMWF and NCEP’s GEFS produced better pattern forecasts, though their amplitudes were consistently too low, even with respect to the 1° smoothed analyses in Fig. 4d. While this can be due in part to the “smearing” effect of ensemble averaging precipitation that occurs when members’ maxima are in different locations, it is apparent that the overall ensemble-mean patterns of heavy precipitation were different than for the analyzed. The deficient precipitation noted in Fig. 3a is, hence, likely to be due in part to errors in the pattern of precipitation that was forecast, not just due to the coarse grid spacing.
We now turn our attention to shorter-range forecasts. Figure 5 shows plume diagrams of accumulated precipitation for the forecasts initialized around 0000 UTC 11 September 2013 (i.e., Tuesday evening), 24 h before the onset of heaviest precipitation in Boulder County (the SREF was actually initialized at 0300 UTC 11 September 2013). Forecasts from the GEFS, SREF, and deterministic GFS and NAM were considered. The two deterministic forecast models show much lighter than analyzed accumulations, and the GEFS system also significantly underforecasted the accumulated precipitation. In the SREF system, however, there were several members with accumulated precipitation that was remarkably consistent with the analyzed precipitation. At the intermediate and larger scales in Figs. 5b and 5c, there was greater consistency between forecast and analyzed precipitation amounts across the modeling systems.
Figure 6 shows stamp maps for the SREF system, indicating that it was the members that used the Advanced Research Weather Research and Forecasting Model (ARW-WRF) that produced the exceptionally high precipitation. These show that the SREF’s ARW-WRF forecasts were rather consistently producing heavy precipitation along the northern Front Range and generally heavy precipitation in much of Colorado down through central New Mexico. SREF system ARW-WRF forecasts initialized several days prior to the event also produced heavy precipitation on Tuesday, a day before the heaviest precipitation (as shown in data presented in appendix C in the online supplemental material). Hence, despite the superior forecasts of the SREF ARW-WRF members for the northern Front Range, it is possible that because heavier precipitation forecasts from those earlier initializations did not occur, forecasters might have discounted somewhat the heavy precipitation in later guidance.
The reasons behind the superior forecasts for the SREF members that used the ARW-WRF are not yet understood. The SREF members used three models, two different control initial conditions, and different perturbations for each member. Further data, presented in the online supplemental material (appendix C), show the mean SREF initial conditions for 10-m and 700-hPa analyzed winds, convectively available potential energy (CAPE), and total precipitable water. This also shows the deviations from the mean of the initial analyses used for the ARW-WRF members, the WRF/Nonhydrostatic Mesoscale Model on the B grid (NMMB), and WRF/Nonhydrostatic Mesoscale Model (NMM). There was no “smoking gun” signature in the local initial conditions that would lead one to conclude obviously that ARW-WRF members would produce much more heavy Front-Range precipitation as a result of their initial state. There was no dramatically enhanced upslope flow, nor especially higher CAPE, nor much greater precipitable water for the ARW-WRF initializations.
At very short lead times, forecasters may examine guidance from the WRF Rapid Refresh (i.e., the RAP). It has been shown (Benjamin et al. 2009) that the radar reflectivity assimilation in the RAP has improved short-range forecast guidance of precipitation and reduced spinup problems relative to other NCEP forecast systems without the digital-filter initialization to radar data. Figure 7 shows plume diagrams for the RAP. Unfortunately, for this case the RAP guidance almost always dramatically underestimated the rate of accumulation of precipitation over Boulder County during the period of most intense rainfall. However, the RAP guidance was more consistent with the analyzed accumulation when considering the forecasts over larger regions. Still, the RAP guidance would not have alerted forecasters to the potential for heavy rainfall near Boulder.
Interestingly, the RAP system used ARW-WRF, as did the SREF system that produced members that forecast the precipitation in the northern Front Range better than other systems. The mere usage of ARW-WRF apparently was not the crucial key to the SREF’s improved forecasts over the northern Front Range. The RAP’s 13-km grid spacing was similar to the SREF’s 16 km. Perhaps the choice of parameterizations may have been the ultimate source of the differences.
4. Conclusions
This article briefly described the performance of precipitation forecast guidance leading up to the flash and river floods in the Front Range and in eastern Colorado, 9–16 September 2013. The article considered both global ensemble predictions from the NCEP GEFS as well as the ECMWF, Met Office, and CMC ensemble systems. Shorter-range forecast guidance from the NCEP GEFS, GFS, NAM, SREF, and RAP forecasts were also examined. Extensive online supplemental appendixes are provided, which provide model configuration details and additional plots of the analyzed conditions and forecast guidance for many other initial times.
The global ensemble prediction systems indicated that an abnormally wet pattern was to be expected in northeastern Colorado during 9–16 September 2013. However, the extent of the actual wetness near Boulder was not captured by any of the global ensemble prediction systems. This result is consistent with Lavers and Villarini (2013). Shorter-range prediction systems also dramatically underforecasted the precipitation amount. Some noteworthy exceptions were the members of the SREF system that used ARW-WRF. These members produced very heavy precipitation in northern Colorado at the time when it was observed. Earlier runs, however, produced forecasts of heavy precipitation prior to the actual heavy precipitation. Interestingly, forecasts from the RAP system, which has very similar initial conditions and that also uses ARW-WRF, did not produce heavy precipitation.
The ARW-WRF simulations in the SREF do suggest that the heavy precipitation in the northern Front Range of Colorado was somewhat predictable. Other scientists (e.g., R. Shumacher 2013, personal communication) have also generated higher-resolution ARW-WRF simulations that forecasted the storm better than most of the operational guidance. It may be that the ARW-WRF system was more predisposed to produce heavy precipitation when run with certain combinations of parameterizations. Further experimentation is suggested to understand what model aspects were particularly important to producing heavy precipitation over the northern Front Range. Ideally, it would be interesting to examine other high-impact cases such as the May 2010 Nashville floods (Moore et al. 2012) and determine if there are any general principles for model configurations to improve quantitative precipitation forecasts (QPFs).
The National Oceanic and Atmospheric Administration (NOAA) has recently emphasized research and development on other high-impact events such as hurricanes relative to quantitative precipitation forecasting. The largely unexceptional forecasts during this event remind us that improving precipitation forecast guidance is still an urgent necessity within NOAA. Plans have previously been formulated that still provide useful a useful roadmap for how NOAA can improve its warm-season quantitative precipitation forecasts (Fritsch and Carbone 2004). Perhaps this event will spur NOAA to “dust off” and vigorously pursue such plans.
Acknowledgments
Goeff DiMego, Geoff Manikin, Jun Du, Yuejian Zhu, and Glenn White of NCEP/EMC are thanked for providing information on accessing model data. Gary Bates of ESRL/PSD is thanked for help with the data processing. Seth Gutman is thanked for help obtaining the GPS total precipitable water time series shown in appendix A in the online supplemental material. This publication was partially supported by a NOAA/Office of Weather and Air Quality (OWAQ) USWRP grant. This project also used data from ECMWF’s TIGGE archive. TIGGE is supported by the World Meteorological Organization’s THORPEX program. Russ Schumacher (Colorado State), Wallace Hogsett (NCEP/WPC), and Jeff Whitaker (ESRL/PSD) are thanked for their consultations.
REFERENCES
Benjamin, S. G., and Coauthors, cited 2009: Rapid refresh/rapid update cycle (RR/RUC) technical review. NOAA/ESRL/GSD internal review. [Available online at http://ruc.noaa.gov/pdf/RR-RUC-TR_11_3_2009.pdf.]
Bougeault, P., and Coauthors, 2010: The THORPEX Interactive Grand Global Ensemble. Bull. Amer. Meteor. Soc., 91, 1059–1072, doi:10.1175/2010BAMS2853.1.
- Search Google Scholar
- Export Citation
Fritsch, J. M., and R. E. Carbone, 2004: Improving quantitative precipitation forecasts in the warm season: A USWRP research and development strategy. Bull. Amer. Meteor. Soc.,85, 955–965, doi:10.1175/BAMS-85-7-955.
Gallus, W. A., Jr., 2002: Impact of verification grid box size on warm season QPF skill measures. Wea. Forecasting, 17, 1296–1302, doi:10.1175/1520-0434(2002)017<1296:IOVGBS>2.0.CO;2.
- Search Google Scholar
- Export Citation
Islam, S., R. L. Bras, and K. A. Emanuel, 1993: Predictability of mesoscale rainfall in the tropics. J. Appl. Meteor., 32, 297–310, doi:10.1175/1520-0450(1993)032<0297:POMRIT>2.0.CO;2.
- Search Google Scholar
- Export Citation
Lavers, D. A., and G. Villarini, 2013: Were global numerical weather prediction systems capable of forecasting the extreme Colorado rainfall of 9–16 September 2013? Geophys. Res. Lett., 40, 6405–6410, doi:10.1002/2013GL058282.
- Search Google Scholar
- Export Citation
Lin, Y., and K. E. Mitchell, 2005: The NCEP Stage II/IV hourly precipitation analyses: Development and applications. 19th Conf. on Hydrology, San Diego, CA, Amer. Meteor. Soc., 1.2. [Available online at https://ams.confex.com/ams/Annual2005/techprogram/paper_83847.htm.]
Moore, B. J., P. J. Neiman, F. M. Ralph, and F. E. Barthold, 2012: Physical processes associated with heavy flooding rainfall in Nashville, Tennessee, and vicinity during 1–2 May 2010: The role of an atmospheric river and mesoscale convective systems. Mon. Wea. Rev., 140, 358–378, doi:10.1175/MWR-D-11-00126.1.
- Search Google Scholar
- Export Citation
1
Three online appendixes accompany this article. Appendix A provides information on the forecast models and a brief synoptic overview of observed/analyzed conditions. Appendix B provides information on precipitation forecasts from global medium-range ensembles. Appendix C provides information on precipitation forecasts from the shorter-range prediction systems.