-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathHeidkeSkillScore.R
281 lines (225 loc) · 17 KB
/
HeidkeSkillScore.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
---
title: "Heidke Skill Score Computation"
author: "Ashish Kondal"
date: "01/12/2024"
output: html_document
---
#-------------------------------------------------------------------------------
## CODE DESCRIPTION
#-------------------------------------------------------------------------------
The provided R code computes the Heidke Skill Score (HSS) for tercile categorization on NMME (North American Multi-Model Ensemble) ensemble mean forecasts with multiple initialization months.
See "NMME_data_sample.JPG" for the structure of input data.
Here's a simple summary of the HSS computation process for a meterological variable (Precipitation):
1. Input Setup:
- Define input and output directories.
- Set the working directory for code chunks.
2.Retrieving Arguments:
- Retrieve command line arguments from a batch script, specifically the initialization month of NMME models.
3. Heidke Skill Score Computation:
- Check if the initialization month is within the desired range (November to February). If not, cancel the run. (Just added for my own problem - no need to do it if you do not need it)
- Select the variable "Precipitation". (Arbitrarily selected one meteorlogical variable)
- Read observation data (GridMET) and forecast data (NMME) from aggregated VIC-CropSyst outputs.
- Restrict data to November through February. (Specific to my problem - skip this step if you don't need it)
- Generate seasonal average precipitation for the specified months.
- Join forecast and observation data, convert it into a shapefile, and perform spatial aggregation based on HUC 06 sub-basins.
4. Spatial Operations and HSS Calculation:
- Define functions for spatial operations and calculating the Heidke Skill Score.
- Loop through each HUC 06 sub-basin:
- Perform spatial operations to extract data for the current sub-basin.
- Calculate terciles for both observation and forecast data.
- Create a contingency table and calculate HSS for each tercile category (Below 33rd, Between 33rd and 67th, Above 67th).
5. Save the HSS results to an output file.
- The HSS results for each sub-basin are saved to output files for further analysis.
In summary, the code conducts a spatial analysis on observational and forecasted data, calculates terciles, and evaluates the Heidke Skill Score for different tercile categories.
This process is repeated for multiple initialization months, providing a comprehensive assessment of model skill in predicting tercile-based precipitation outcomes.
The results are saved for each HUC 06 sub-basin, allowing for further investigation and interpretation of skill scores.
#-------------------------------------------------------------------------------
## SETUP - STARTING WITH FRESH ENVIRONMENT AND CONSOLE
#-------------------------------------------------------------------------------
```{r setup, include=FALSE}
knitr::opts_chunk$set(fig.width = 12, fig.height = 8, fig.path ='Figs/',echo = FALSE, warning = FALSE, message = FALSE)
cat("\014") # Clear console
rm(list = ls()) # Clearing the Environment and saved variable
```
#-------------------------------------------------------------------------------
## INSTALLING REQUIRED PACKAGES
# Skip if you already have required packages installed
#-------------------------------------------------------------------------------
#install.packages("dplyr")
#install.packages("data.table")
#install.packages("tidyverse")
#install.packages("knitr")
#install.packages("readr")
#install.packages("sf")
#install.packages("pryr") # To check memory usage
#-------------------------------------------------------------------------------
## LOADING PACKAGES
#-------------------------------------------------------------------------------
```{r Loading Packages, include = FALSE, echo = FALSE, warning=FALSE}
library(readr)
library(knitr)
library(dplyr)
library(sf)
```
#-------------------------------------------------------------------------------
## INPUT SETUP - DEFINE VARIABLES AND DIRECTORIES
#-------------------------------------------------------------------------------
```{r Input setup}
input_dir <- paste("Path_To_Input_Directory/", sep="") # Setting working directory and suffix as file address
output_dir <- paste("Path_To_Output_Directory/", sep="") # Setting output directory to store files and final images
# Setting working directory for all code chunks
knitr::opts_knit$set(root.dir = input_dir) # Setting input directory as working directory: where all the input files exists.
# Create output directory if not already exists
if (!dir.exists(paste(output_dir, sep="")))
{ dir.create(paste(output_dir, sep="")) }
months_list <- c("January","February","March","April","May","June","July","August","September","October","November","December")
Seasons <- c("Summer","Winter","Fall","Spring")
# Hardcoded variables (Specific to problem)
basin <- c('pnw')
variable_list <- c("Sum_PPT","Avg_Tavg","Avg_Baseflow","Sum_Baseflow","Avg_Runoff","Sum_Runoff") # Variables List (name corresponding to "Aggregated_FLUX_File").
varDisp <- c("Accumulated_Precipitation", "Average_Temperature"," Average_Baseflow_Runoff","Accumulated_Baseflow_Runoff","Average_Runoff","Accumulated_Runoff") # Full display name of corresponding variable list.
# Shapefiles of the study region on HUC 06 spatial aggregation
huc6_shapefile_address <- paste("Path_To_Input_Directory_of_Shapefile/huc6.shp", sep="")
huc6_shp <- st_read(huc6_shapefile_address)
```
#-------------------------------------------------------------------------------
## RETRIEVING ARGUMENTS FROM BATCH SCRIPT
#-------------------------------------------------------------------------------
```{r Retrieving Arguments}
args <- commandArgs(trailingOnly = TRUE) # Retrieving argument from batch script
init <- as.numeric(args[1]) # Enforcing retrieved argument to be in numeric format
```
#--------------------------------------------------------------------------------------------------------------------------------------------------------------
#--------------------------------------------------------------------------------------------------------------------------------------------------------------
## HEIDKE SKILL SCORE COMPUTATION
# Heidke Skill Score of Accumulated Precipitation: November thru February
#--------------------------------------------------------------------------------------------------------------------------------------------------------------
#--------------------------------------------------------------------------------------------------------------------------------------------------------------
```{r Seasonal HSS, echo=FALSE, warning = FALSE}
# Killing Unnecssary Runs of Unwanted INITIALIZATION months - If we don't do it, we are unnecessary running these initialization from batch script.
if (isTRUE(init[!(init %in% c(6,7,8,9,10,11))] == init)){ # Check if initializtion months "init" isn't a part of list. If not, then cancel the run.
print("Files are not generated for this initialization because it doesn't have required forecast season")
quit()
}
# Selecting appropriate variable - Precipitation
req_cols <- c("Lat","Lon","Year","Month",variable_list[1]) # Selecting appropriate variables/columns from the input files. See "variable_list" for the list of available variables.
disp_variable_name <- varDisp[1] # Storing selected variable name for later use.
cat(disp_variable_name)
#-------------------------------------------------------------------------------
## READING OBSERVATION DATASET
# Reading GridMET File - Aggregated VIC-CropSyst's Output
#-------------------------------------------------------------------------------
histfile <- paste(input_dir,basin,"_FLUX_GridMET_VIC_CropSyst_Aggregated_Output_NEW1.txt",sep="") # Name (with fullpath) of Aggregated VIC-CropSyst Flux File generated with GridMET data.
vhist_data <- read_csv(file = histfile, col_names = TRUE, show_col_types = FALSE, col_select = all_of(req_cols)) # Reading and only selecting particular columns to save memory and faster computations.
#-------------------------------------------------------------------------------
## READING FORECAST DATASET
# Reading NMME File - Aggregated VIC-CropSyst's Output
#-------------------------------------------------------------------------------
mm_inputfilename <- paste(input_dir,basin,"_FLUX_ENSMEAN_VIC_CropSyst_Aggregated_Output_InitMonth_0",init,"_NEW1.txt",sep="") # Name (with fullpath) of Aggregated VIC-CropSyst Flux File generated with NMME data.
mm <- read_csv(file = mm_inputfilename,col_names = TRUE, show_col_types = FALSE, col_select = all_of(req_cols)) # Reading and only selecting particular columns to save memory and faster computations.
#-------------------------------------------------------------------------------
## MAKE CONSISTENT TIME FRAME OF OBSERVATION AND FORECAST DATASET
# Since we are interested only in Novebmer through February precipitation, we extracted the data for these months only.
#-------------------------------------------------------------------------------
obv_data <- filter(vhist_data,(Month >= 11 | Month <=2))
mm_data <- filter(mm,(Month >= 11 | Month <=2))
obv_data <- dplyr::arrange(obv_data,Lon,Lat,Year,Month)
colnames(obv_data) <- c("Lat","Lon","Year","Month","GridMET")
mm_data <- dplyr::arrange(mm_data,Lon,Lat,Year,Month)
colnames(mm_data) <- c("Lat","Lon","Year","Month","ENSMEAN")
#-------------------------------------------------------------------------------
## GENERATING SEASONAL ACCUMULATED PRECIPITATION - NOV THRU FEB
#-------------------------------------------------------------------------------
# For ease of operations, we increase the year corresponding to November and December by 1, this way we can easily summed up Nov-Dec-Jan-Feb in one step.
mm_data <- mm_data %>% mutate(Year = ifelse(Month %in% c(11, 12), Year + 1, Year))
obv_data <- obv_data %>% mutate(Year = ifelse(Month %in% c(11, 12), Year + 1, Year))
obv_data1 <- obv_data %>% group_by(Lat,Lon,Year) %>% summarize(sum(GridMET)) # For this step, we increased year by 1
frc_data1 <- mm_data %>% group_by(Lat,Lon,Year) %>% summarize(sum(ENSMEAN))
# Ungroup the datasets to avoid unintended grouping and incompatibility issues in downstream operations
frc_data1 <- ungroup(frc_data1)
obv_data1 <- ungroup(obv_data1)
# Reassuring that column names are still unchanged after ungrouping
colnames(obv_data1) <- c("Lat","Lon","Year","GridMET")
colnames(frc_data1) <- c("Lat","Lon","Year","ENSMEAN")
#-------------------------------------------------------------------------------
## JOIN FORECAST AND OBSERVATION DATA TOGETHER & CONVERT INTO SHAPEFILE
#-------------------------------------------------------------------------------
merged_data <- merge(frc_data1,obv_data1,by=c("Lat","Lon","Year"))
colnames(merged_data) <- c("Lat","Lon","Year","ENSMEAN","GridMET")
merged_data <- na.omit(merged_data) # Removing any NaN's
sf_data = st_as_sf(merged_data, coords = c("Lon", "Lat"),crs=4326) # Transforming merged data into shapefile for subsequent processing
#-------------------------------------------------------------------------------
## SPATIAL AGGREGATION - HUC 06
#-------------------------------------------------------------------------------
name_huc <- huc6_shp$Name # Save Name column from huc_06 shapefile into a new variable.
HSS_outfilename <- paste(output_dir, "HUC06_HeidkeSkillScore_Precipitaion_InitMonth_0",init,"_NDJF.txt",sep="") # Output File name. All seasons AC will be stored in a single file.
#-------------------------------------------------------------------------------
## FUNCTION FOR SPATIAL OPERATIONS
#-------------------------------------------------------------------------------
perform_spatial_operations <- function(current_huc, huc6_shp, sf_data) {
nam1 <- subset(huc6_shp, Name == current_huc) # Subset the HUC06 file to extract current sub-basin's shapefile.
shp1 = st_transform(structure(nam1, proj4string = "+init=epsg:3857"), "+init=epsg:4326") # Transform the projection of subsetted sub-basin's shapefile
intersect_ind =st_intersects(sf_data, shp1, sparse=FALSE) # Finding row index of coordinates in sf_data (i.e. merged_data) which falls within sub-basin.
data_hucc = sf_data[intersect_ind,] # Extracting data from sf_data based on "intersect_ind".
df_data_huc <- st_drop_geometry(data_hucc) %>% # Here, we are just removing the "geometry" column from the dataframe. We don't need it anymore.
group_by(Year) %>% # Grouping the huc_data based on Year and then, summarize the column.
summarize(GridMET = sum(GridMET), ENSMEAN = sum(ENSMEAN))
df_data_huc <- na.omit(df_data_huc) # Removing unintentionally introduced NA's in the data
return(df_data_huc)
}
#-------------------------------------------------------------------------------
## FUNCTION FOR CALCULATING HEIDKE SKILL SCORE
#-------------------------------------------------------------------------------
calculate_skill_metrics <- function(df_data_huc) {
if (nrow(df_data_huc) != 0) {
# Generate terciles based on observation and forecast data
quantiles <- quantile(df_data_huc$GridMET, probs = c(1/3, 2/3))
quantiles_f <- quantile(df_data_huc$ENSMEAN, probs = c(1/3, 2/3))
# Create tercile categorization of observation and forecast data
df_data_huc$Obs_Tercile <- cut(df_data_huc$GridMET, breaks = c(-Inf, quantiles[1], quantiles[2], Inf), labels = c("Below 33rd", "Between 33rd and 67th", "Above 67th"))
df_data_huc$Forecast_Tercile <- cut(df_data_huc$ENSMEAN, breaks = c(-Inf, quantiles_f[1], quantiles_f[2], Inf), labels = c("Below 33rd", "Between 33rd and 67th", "Above 67th"))
# Create a contingency table
contingency_table <- table(df_data_huc$Obs_Tercile, df_data_huc$Forecast_Tercile)
# Retrieving elements of contingency table for subsequent processing. Post scripts (1,2,3) in the variable's name corresponds to "Below 33rd", "Between 33rd and 67th", and "Above 67th" category.
Hit1=contingency_table[1,1]
Hit2=contingency_table[2,2]
Hit3=contingency_table[3,3]
CN1=contingency_table[3,2]+contingency_table[2,2]+contingency_table[2,3]+contingency_table[3,3]
CN2=contingency_table[1,1]+contingency_table[1,3]+contingency_table[3,1]+contingency_table[3,3]
CN3=contingency_table[1,1]+contingency_table[1,2]+contingency_table[2,2]+contingency_table[2,1]
Miss1=contingency_table[2,1]+contingency_table[3,1]
Miss2=contingency_table[1,2]+contingency_table[3,2]
Miss3=contingency_table[1,3]+contingency_table[2,3]
FA1=contingency_table[1,2]+contingency_table[1,3]
FA2=contingency_table[2,1]+contingency_table[2,3]
FA3=contingency_table[3,1]+contingency_table[3,2]
# Computing Hedike Skill Score for Each Tercile Category Separately
#HSS = (2*((Hit * CN)-(FA*Miss)))/(((Hit+Miss)*(Miss+CN))+((Hit+FA)*(FA+CN)))
pc_cond1 =((2*((as.double(Hit1)*(as.double(CN1)))-((as.double(FA1)*as.double(Miss1)))))/(((as.double(Hit1)+as.double(Miss1))*(as.double(Miss1)+as.double(CN1)))+((as.double(Hit1)+as.double(FA1))*(as.double(FA1)+as.double(CN1))))) * 100 # obs/2 is subtracted, because based on Tercile Categorization.
pc_cond2 =((2*((as.double(Hit2)*(as.double(CN2)))-((as.double(FA2)*as.double(Miss2)))))/(((as.double(Hit2)+as.double(Miss2))*(as.double(Miss2)+as.double(CN2)))+((as.double(Hit2)+as.double(FA2))*(as.double(FA2)+as.double(CN2))))) * 100 # obs/2 is subtracted, because based on Tercile Categorization.
pc_cond3 =((2*((as.double(Hit3)*(as.double(CN3)))-((as.double(FA3)*as.double(Miss3)))))/(((as.double(Hit3)+as.double(Miss3))*(as.double(Miss3)+as.double(CN3)))+((as.double(Hit3)+as.double(FA3))*(as.double(FA3)+as.double(CN3))))) * 100 # obs/2 is subtracted, because based on Tercile Categorization.
HSS_data <- as.data.frame(cbind(current_huc,pc_cond1,pc_cond2,pc_cond3))
colnames(HSS_data) <- c("HUC06_Name","Below 33rd Percentile", "Between 33rd and 67th Percentile","Above 67th Percentile")
return(list(HSS_data = HSS_data))
} else {
return(NULL)
}
}
#-------------------------------------------------------------------------------
## FUNCTION FOR SAVING HSS OUTPUT FILE
#-------------------------------------------------------------------------------
for (gg in 1:length(name_huc)) { # Running loop for the length of "name_huc" i.e. for number of sub-basins in particular huc shapefile.
current_huc <- name_huc[gg] # Current Sub-basin
df_data_huc <- perform_spatial_operations(current_huc, huc6_shp, sf_data) # Perform spatial operation on current sub-basin
metrics_result <- calculate_skill_metrics(df_data_huc) # Computing Heidke Skill Score for current sub-basin
if (!is.null(metrics_result)) {
HSS_data <- metrics_result$HSS_data
if (file.exists(HSS_outfilename) == TRUE) {
write_csv(x = HSS_data, file = HSS_outfilename, append = TRUE, col_names = FALSE)
}else{
write_csv(x = HSS_data, file = HSS_outfilename, append = FALSE, col_names = TRUE)
}
rm(HSS_data)
}
}
```