fieldClim: Missing-data inspection and input checks

Purpose

This tutorial shows how to inspect station data before H/LE heat-flux calculations. The goal is to learn the inspection workflow, not to repair the dataset.

fieldClim does not repair, fill, impute, interpolate or complete the data. It reports missingness, gap blocks, variable classes, quality-control flags and method readiness. The first decision is the variable type and the gap length, because method suitability depends on what is missing and how the gap is structured. Quality control comes before any external gap-filling workflow.

A systematic decision guide for matching calculation paths to measurement architecture is provided in Choosing fieldClim Heat-Flux Methods by Measurement Design. This vignette prepares the data and energy-balance inputs for that decision; the actual application of heat-flux methods is continued in the second workflow vignette.

The sections below repeat the same tutorial pattern: run a command, inspect a compact original output, then read a cleaned table and interpretation.

Reference notation

This vignette uses the same reference notation as the method-selection page. Q* is net radiation, B is soil heat flux, A = Q* - B is available energy, H is sensible heat flux, LE is latent heat flux, and R_E is the residual or non-closed energy term.

The R column names used in this vignette are not renamed. The package fields rad_bal and soil_flux are mapped to Q* and B; later package outputs named sensible_* correspond to H, and outputs named latent_* correspond to LE.

Reference quantity	Meaning	`fieldClim` field or later output
`Q*`	net radiation	`rad_bal`
`B`	soil heat flux	`soil_flux`
`A = Q* - B`	available energy	`rad_bal - soil_flux`
`H`	sensible heat flux	`sensible_*`
`LE`	latent heat flux	`latent_*`
`R_E = Q* - B - H - LE`	residual / non-closed energy term	closure diagnostics

This first inspection vignette does not compute H, LE or R_E. It checks the input fields that later determine whether these quantities can be interpreted.

Example dataset

The example is based on the packaged Caldern one-day dataset. It contains one five-minute day. Artificial gaps and obvious sensor problems were inserted only to demonstrate inspection behavior. The file is not a meteorological benchmark; it is a controlled tutorial dataset. The original package dataset is unchanged.

First load the file. The robust file lookup is handled before this chunk; the visible command is the part users normally need.

library(fieldClim)

caldern <- read.csv(
  gap_file,
  na.strings = c("NA", "NULL", ""),
  stringsAsFactors = FALSE
)
caldern$datetime <- as.POSIXct(
  caldern$datetime,
  tz = "Europe/Berlin"
)

A compact original view is enough to understand the logger columns used in the tutorial. It shows time, air temperature, humidity, net radiation, wind speed and soil heat flux.

head(caldern[, c(
  "datetime", "Ta_2m", "Huma_2m", "rad_net",
  "Windspeed_2m", "heatflux_soil"
)])
#>              datetime Ta_2m Huma_2m rad_net Windspeed_2m heatflux_soil
#> 1 2017-06-30 00:00:00 13.09   100.0 -15.200        0.448      1.551533
#> 2 2017-06-30 00:05:00 13.01   100.0  -8.920        0.380      1.492695
#> 3 2017-06-30 00:10:00 13.02   100.0  -1.965        0.548      1.448708
#> 4 2017-06-30 00:15:00 13.16   100.0  -1.790        0.581      1.390439
#> 5 2017-06-30 00:20:00 13.27   100.0  -2.469        0.764      1.325316
#> 6 2017-06-30 00:25:00 13.69    98.1  -3.857        0.589      1.268762

The overview table summarizes the dataset without printing all rows.

Item	Value
Rows	288
First timestamp	2017-06-30 00:00:00 CEST
Last timestamp	2017-06-30 23:55:00 CEST
Median timestep	300 seconds
Number of variables	19
Variables with missing values	6

The row count is still the expected 288 records for a five-minute day. The inspection problem is therefore not a missing file or an incomplete day. It is that specific variables contain gaps or suspicious values inside an otherwise regular teaching example.

Build the weather_station object

build_weather_station() is only a container step. It stores station variables under names expected by fieldClim functions. It does not check physics, repair missing values or calculate replacement fields.

ws <- build_weather_station(
  datetime = caldern$datetime,
  temp = caldern$Ta_2m,
  rh = caldern$Huma_2m,
  t1 = caldern$Ta_2m,
  t2 = caldern$Ta_10m,
  hum1 = caldern$Huma_2m,
  hum2 = caldern$Huma_10m,
  v1 = caldern$Windspeed_2m,
  v2 = caldern$Windspeed_10m,
  rad_bal = caldern$rad_net,
  soil_flux = caldern$heatflux_soil,
  lat = 50.8405,
  lon = 8.6832,
  elev = 270,
  z1 = 2,
  z2 = 10,
  surface_type = "field"
)

The compact object structure shows that ws is a named weather-station object. The vector lengths identify which fields are time series and which are station metadata.

names(ws)
#>  [1] "datetime"     "temp"         "rh"           "t1"           "t2"          
#>  [6] "hum1"         "hum2"         "v1"           "v2"           "rad_bal"     
#> [11] "soil_flux"    "lat"          "lon"          "elev"         "z1"          
#> [16] "z2"           "surface_type"
sapply(ws, length)
#>     datetime         temp           rh           t1           t2         hum1 
#>          288          288          288          288          288          288 
#>         hum2           v1           v2      rad_bal    soil_flux          lat 
#>          288          288          288          288          288            1 
#>          lon         elev           z1           z2 surface_type 
#>            1            1            1            1            1

The important names for later checks are temp, rh, rad_bal, soil_flux, v1, v2, t1, t2, hum1 and hum2. Those are the names downstream functions use, regardless of the original logger column names.

Run the inspection

Now run the inspection function. It returns a structured report, not modified station data.

inspection <- inspect_weather_station_inputs(ws)

The first compact original output is simply the object structure: the returned list contains field-level status, gap blocks, method readiness, quality-control flags, guidance text and a summary.

names(inspection)
#> [1] "fields"           "gaps"             "method_readiness" "qc_flags"        
#> [5] "guidance"         "summary"
sapply(inspection, function(x) {
  if (is.data.frame(x)) {
    paste(nrow(x), "rows x", ncol(x), "columns")
  } else {
    paste(class(x), collapse = ", ")
  }
})
#>                 fields                   gaps       method_readiness 
#> "42 rows x 10 columns"   "7 rows x 9 columns"   "6 rows x 7 columns" 
#>               qc_flags               guidance                summary 
#>   "6 rows x 5 columns"   "5 rows x 2 columns"                 "list"

Read these components as follows. fields has one row per expected station field. gaps reports consecutive missing blocks, not just total NA counts. qc_flags identifies existing values that are suspicious or physically impossible. method_readiness reports which heat-flux methods have their required input fields.

Inspect variable-level missingness

This step asks: which station fields contain missing values, and which variable classes do those fields represent?

The compact original output below is a filtered view of inspection$fields. It shows only fields that are present and have at least one missing value.

subset(
  inspection$fields,
  present & n_missing > 0,
  select = c(field, variable_type, group, n_missing, n_total, missing_fraction)
)
#>        field  variable_type     group n_missing n_total missing_fraction
#> 5    rad_bal      radiation radiation        12     288      0.041666667
#> 17 soil_flux soil heat flux      soil        72     288      0.250000000
#> 25        rh       humidity  humidity         5     288      0.017361111
#> 26      hum1       humidity  humidity         5     288      0.017361111
#> 33      temp    temperature  profiles         1     288      0.003472222
#> 34        t1    temperature  profiles         1     288      0.003472222
#> 37        v1     wind speed  profiles        12     288      0.041666667

The same result is easier to read as a compact interpretation table.

	Field	Variable class	Group	Missing values	Missing fraction	First missing row	Largest gap (steps)
1	hum1	humidity	humidity	5	1.7%	40	5
3	rh	humidity	humidity	5	1.7%	40	5
5	t1	temperature	profiles	1	0.3%	20	1
6	temp	temperature	profiles	1	0.3%	20	1
7	v1	wind speed	profiles	12	4.2%	130	12
2	rad_bal	radiation	radiation	12	4.2%	100	12
4	soil_flux	soil heat flux	soil	72	25.0%	180	72

In this dataset, the affected fields are temp, t1, rh, hum1, rad_bal, v1 and soil_flux. The most consequential fields for heat-flux calculations are rad_bal and soil_flux, because together they define available energy as Q* - B. Missing v1 affects aerodynamic and profile-related methods. Missing hum1 affects humidity-gradient and Penman-type calculations. Missing t1 affects profile methods and Bulk-Residual estimates of H.

Inspect gap blocks

Total missing counts are not enough. This step asks whether the missing values are isolated or form continuous blocks.

The compact original output sorts the gap table by length and shows the most important gaps first.

inspection$gaps[order(-inspection$gaps$n_timesteps), ][1:10, ]
#>          field  variable_type gap_start_index gap_end_index n_timesteps
#> 2    soil_flux soil heat flux             180           251          72
#> 1      rad_bal      radiation             100           111          12
#> 7           v1     wind speed             130           141          12
#> 3           rh       humidity              40            44           5
#> 4         hum1       humidity              40            44           5
#> 5         temp    temperature              20            20           1
#> 6           t1    temperature              20            20           1
#> NA        <NA>           <NA>              NA            NA          NA
#> NA.1      <NA>           <NA>              NA            NA          NA
#> NA.2      <NA>           <NA>              NA            NA          NA
#>               start_time            end_time duration_seconds gap_class
#> 2    2017-06-30 14:55:00 2017-06-30 20:50:00            21600      long
#> 1    2017-06-30 08:15:00 2017-06-30 09:10:00             3600    medium
#> 7    2017-06-30 10:45:00 2017-06-30 11:40:00             3600    medium
#> 3    2017-06-30 03:15:00 2017-06-30 03:35:00             1500    medium
#> 4    2017-06-30 03:15:00 2017-06-30 03:35:00             1500    medium
#> 5    2017-06-30 01:35:00 2017-06-30 01:35:00              300     short
#> 6    2017-06-30 01:35:00 2017-06-30 01:35:00              300     short
#> NA                  <NA>                <NA>               NA      <NA>
#> NA.1                <NA>                <NA>               NA      <NA>
#> NA.2                <NA>                <NA>               NA      <NA>

The interpreted table keeps the same information but formats timestamps and duration for reading.

	Field	Variable class	Gap start	Gap end	Steps	Duration	Gap class
2	soil_flux	soil heat flux	14:55	20:50	72	6 h	long
1	rad_bal	radiation	08:15	09:10	12	1 h	medium
7	v1	wind speed	10:45	11:40	12	1 h	medium
4	hum1	humidity	03:15	03:35	5	25 min	medium
3	rh	humidity	03:15	03:35	5	25 min	medium
6	t1	temperature	01:35	01:35	1	5 min	short
5	temp	temperature	01:35	01:35	1	5 min	short

A single missing five-minute value is a row-level interruption. A 30-60 minute gap starts to affect subdaily interpretation. The multi-hour soil_flux gap is more serious because it affects available energy Q* - B. Wind gaps can affect Bulk-Residual, Penman and profile-based methods. Humidity gaps affect Bowen, Monin-Obukhov/Profile and Penman-type paths.

Entry matrix: variable type and gap length

The inspection table tells us where values are missing. The entry matrix explains why the consequence differs by variable type.

Variable type	Fields in this example	What the inspection shows	Why this matters	What fieldClim does
Temperature	`temp`, `t1`	An isolated missing air-temperature value.	Temperature enters profile gradients and helper calculations.	Reports this gap and affected methods. It does not repair the value.
Humidity	`rh`, `hum1`	A short humidity gap and one invalid relative-humidity value.	Humidity affects Bowen, Monin-Obukhov/Profile and Penman-type inputs.	Reports the gap and QC flag. It does not correct humidity values.
Radiation	`rad_bal`	A medium net-radiation gap and a suspicious shortwave value in the source data.	Radiation controls available energy for energy-balance methods.	Reports affected radiation fields. It does not substitute modeled radiation.
Wind speed	`v1`	A medium wind-speed gap and one negative wind-speed value.	Wind controls aerodynamic and profile-based methods.	Reports the gap and negative-wind flag. It does not invent wind speed.
Soil heat flux	`soil_flux`	A long continuous soil heat-flux gap.	Soil heat flux is subtracted from net radiation in `A = Q* - B`.	Reports the long gap and affected energy-balance methods. It does not replace soil heat flux.

This table is not a ranking of filling methods. It is a reading guide for the inspection output. The key research-based point is that variable type, gap length and QC status determine the next external decision more than any universal method ranking.

Quality-control flags

Missing values are not the only problem. Some values are present but should not be accepted without review.

The compact original output below is the actual qc_flags table returned by inspect_weather_station_inputs().

inspection$qc_flags
#>       field row_index                   flag severity
#> 1        rh        60 humidity_outside_0_100    error
#> 2      hum1        60 humidity_outside_0_100    error
#> 3        v1        70    negative_wind_speed    error
#> 4  datetime        80   duplicated_timestamp  warning
#> 5  datetime        NA     irregular_timestep  warning
#> 6 soil_flux       180               long_gap  warning
#>                                                                                    message
#> 1                 Relative humidity should be within 0..100 percent before downstream use.
#> 2                 Relative humidity should be within 0..100 percent before downstream use.
#> 3                                                       Wind speed should not be negative.
#> 4             Duplicated timestamps can invalidate gap-length and workflow interpretation.
#> 5           Datetime spacing is irregular; inspect timebase before external gap treatment.
#> 6 Long missing-data run; variable type and gap length should guide any external treatment.

The formatted table adds timestamps and values where they are available.

Field	Timestamp	Value	Flag type	Explanation
rh	2017-06-30 04:55	105.0	humidity_outside_0_100	Relative humidity should be within 0..100 percent before downstream use.
hum1	2017-06-30 04:55	105.0	humidity_outside_0_100	Relative humidity should be within 0..100 percent before downstream use.
v1	2017-06-30 05:45	-0.5	negative_wind_speed	Wind speed should not be negative.
datetime	2017-06-30 06:30	NA	duplicated_timestamp	Duplicated timestamps can invalidate gap-length and workflow interpretation.
datetime	NA	NA	irregular_timestep	Datetime spacing is irregular; inspect timebase before external gap treatment.
soil_flux	2017-06-30 14:55	NA	long_gap	Long missing-data run; variable type and gap length should guide any external treatment.

In this dataset, relative humidity above 100 percent is invalid. Negative wind speed is invalid. Duplicated or irregular timestamps affect gap-length interpretation. A suspicious radiation value should be checked against time of day and expected physical range. The value is still present in the data; the inspection warns that it should not be accepted without review.

The synthetic source data also include two intentionally obvious spike examples that are useful for manual review.

Field	Timestamp	Value	Interpretation
rad_sw_in	2017-06-30 15:45	5000	Check shortwave radiation against time of day and expected range.
Ta_2m	2017-06-30 16:35	45	Check whether the temperature spike is a sensor artefact.

Method readiness

This step connects the inspection to downstream heat-flux workflows. It asks which methods have their required fields and which required fields contain missing values.

The compact original output keeps the relevant readiness columns visible.

inspection$method_readiness[, c(
  "method", "missing_fields", "partial_fields", "ready"
)]
#>                   method missing_fields                     partial_fields
#> 1       priestley_taylor                          temp, rad_bal, soil_flux
#> 2          bulk_residual                        t1, v1, rad_bal, soil_flux
#> 3 bulk_residual_ri_guard                        t1, v1, rad_bal, soil_flux
#> 4                  bowen                      t1, hum1, rad_bal, soil_flux
#> 5          monin_profile                                      t1, hum1, v1
#> 6                 penman     obs_height v1, temp, rad_bal, soil_flux, hum1
#>   ready
#> 1  TRUE
#> 2  TRUE
#> 3  TRUE
#> 4  TRUE
#> 5  TRUE
#> 6 FALSE

The interpreted method table translates that result into method-level consequences for this dataset.

Method	Required field groups	Structurally available?	Fields with gaps	What this means for this dataset
Bowen-ratio	temperature and humidity gradients, heights, `rad_bal`, `soil_flux`	Yes	t1, hum1, rad_bal, soil_flux	If `rad_bal`, `soil_flux` or `temp` is missing at a timestep, `A = Q* - B` or temperature input is unavailable.
Bulk-Residual	temperature difference, wind, heights, `rad_bal`, `soil_flux`	Yes	t1, v1, rad_bal, soil_flux	Gaps in `t1`, `v1`, `rad_bal` or `soil_flux` affect `H_bulk` or residual `LE` at those rows.
Bulk-Residual with Richardson guard	Bulk-Residual inputs plus two wind heights	Yes	t1, v1, rad_bal, soil_flux	Two wind heights exist, but the same wind and energy-input gaps still matter row by row.
Monin-Obukhov/Profile	temperature, humidity and wind profiles plus site metadata	Yes	t1, hum1, v1	Missing or invalid humidity affects the gradient ratio; energy-input gaps also affect partitioning.
Penman-type `LE`	radiation, soil heat flux, temperature, humidity, wind and site metadata	No	v1, temp, rad_bal, soil_flux, hum1	Missing or invalid profile values directly affect the diagnostic profile calculation.
Priestley-Taylor	`rad_bal`, `soil_flux`, `temp`, `surface_type`	Yes	temp, rad_bal, soil_flux	Penman uses radiation, soil heat flux, temperature, humidity, wind and site metadata; it returns latent heat only.

Priestley-Taylor uses rad_bal, soil_flux, temp and surface_type. If rad_bal or soil_flux is missing at a timestep, A = Q* - B is unavailable.

Bulk-Residual uses temperature difference, wind and heights for H_bulk, plus rad_bal and soil_flux for residual LE. With the optional Richardson guard, two wind heights are needed.

Bowen uses temperature and humidity gradients. Missing or invalid humidity affects the gradient ratio.

Monin-Obukhov/Profile uses temperature, humidity and wind profiles. Missing or invalid profile values directly affect the profile calculation.

Penman uses radiation, soil heat flux, temperature, humidity, wind and site metadata. It returns LE only.

External continuation boundary

If gaps or suspicious values are found, the next step depends on the variable type, gap length and analysis goal. A short temperature gap, a radiation gap during changing cloud conditions and a missing wind profile do not have the same meaning for later heat-flux calculations. This is why the inspection first reports variable classes, gap blocks, QC flags and method readiness instead of ranking algorithms.

Package	Main_focus	Strength_for_this_task	Limitation_for_this_task	Best_fit_in_this_vignette
climatol	Climatological station series: quality control, homogenization, missing-data workflows and derived climate products.	Most relevant when a station series is longer than the one-day example and the problem is not only a short local gap, but consistency of a climatological record. It can support QC, homogenization and documented reconstruction of standard climate variables.	Not designed as an automatic row-level repair step inside a heat-flux calculation. Its assumptions, homogenization choices and reconstructed values would have to be documented before re-importing data into fieldClim.	Longer temperature, humidity, radiation or other climate-station series after fieldClim has shown where the gaps and QC problems are.
dataresqc	Quality control and formatting of historical daily and sub-daily climate observations.	Most relevant before any later reconstruction step when the main question is whether the observed series is technically and physically trustworthy. It is useful for systematic QC, formatting and flagging of daily or sub-daily climate observations.	Primarily a QC and data-rescue tool, not a heat-flux or microclimate modelling package. It helps decide whether observations are trustworthy; it does not make fieldClim methods run on missing inputs.	Checking whether suspicious values such as impossible humidity, negative wind speed or inconsistent time structure should be flagged before any further processing.
meteo	Spatial and spatio-temporal prediction for meteorological and environmental station variables.	Most relevant when local inspection shows that gaps cannot be interpreted from the target station alone. It can use neighbouring stations, coordinates, time and covariates for spatial or spatio-temporal prediction.	Requires a spatial prediction setup with stations, coordinates and validation. It is not a replacement for measured radiation, wind or soil-flux inputs unless that external modelling decision is explicitly justified.	Situations where fieldClim inspection shows that a variable is missing for too long to interpret locally and neighbouring stations or spatial covariates are available.

Summary

This tutorial demonstrated how inspect_weather_station_inputs() can be used before running fieldClim heat-flux methods. The function returns a structured inspection report with variable-level availability, missing-value runs, quality-control flags and method-readiness information.

The example shows why missing data must be interpreted by variable type and gap length. A short temperature gap, a radiation gap, a missing wind profile and a long soil-flux gap do not have the same consequences for later calculations. The inspection output therefore helps identify which variables require review and which method families are affected. The method-selection page then determines which calculation path can be interpreted from the inspected measurement architecture.

fieldClim does not fill, impute, interpolate, complete or replace missing values. It reports the problem. Any decision to repair or reconstruct data must be made outside fieldClim, documented separately, and followed by a new inspection before heat-flux calculations are interpreted.

Jörg Bendix, Chris Reudenbach

2026-05-28