fieldClim: Missing-data inspection and input checks
Jörg Bendix, Chris Reudenbach
2026-05-28
Source:vignettes/fieldclim_missing_data_en.Rmd
fieldclim_missing_data_en.RmdPurpose
This tutorial shows how to inspect station data before
H/LE heat-flux calculations. The goal is to
learn the inspection workflow, not to repair the dataset.
fieldClim does not repair, fill, impute, interpolate or
complete the data. It reports missingness, gap blocks, variable classes,
quality-control flags and method readiness. The first decision is the
variable type and the gap length, because method suitability depends on
what is missing and how the gap is structured. Quality control comes
before any external gap-filling workflow.
A systematic decision guide for matching calculation paths to measurement architecture is provided in Choosing fieldClim Heat-Flux Methods by Measurement Design. This vignette prepares the data and energy-balance inputs for that decision; the actual application of heat-flux methods is continued in the second workflow vignette.
The sections below repeat the same tutorial pattern: run a command, inspect a compact original output, then read a cleaned table and interpretation.
Reference notation
This vignette uses the same reference notation as the
method-selection page. Q* is net radiation, B
is soil heat flux, A = Q* - B is available energy,
H is sensible heat flux, LE is latent heat
flux, and R_E is the residual or non-closed energy
term.
The R column names used in this vignette are not renamed. The package
fields rad_bal and soil_flux are mapped to
Q* and B; later package outputs named
sensible_* correspond to H, and outputs named
latent_* correspond to LE.
| Reference quantity | Meaning |
fieldClim field or later output |
|---|---|---|
Q* |
net radiation | rad_bal |
B |
soil heat flux | soil_flux |
A = Q* - B |
available energy | rad_bal - soil_flux |
H |
sensible heat flux | sensible_* |
LE |
latent heat flux | latent_* |
R_E = Q* - B - H - LE |
residual / non-closed energy term | closure diagnostics |
This first inspection vignette does not compute H,
LE or R_E. It checks the input fields that
later determine whether these quantities can be interpreted.
Example dataset
The example is based on the packaged Caldern one-day dataset. It contains one five-minute day. Artificial gaps and obvious sensor problems were inserted only to demonstrate inspection behavior. The file is not a meteorological benchmark; it is a controlled tutorial dataset. The original package dataset is unchanged.
First load the file. The robust file lookup is handled before this chunk; the visible command is the part users normally need.
library(fieldClim)
caldern <- read.csv(
gap_file,
na.strings = c("NA", "NULL", ""),
stringsAsFactors = FALSE
)
caldern$datetime <- as.POSIXct(
caldern$datetime,
tz = "Europe/Berlin"
)A compact original view is enough to understand the logger columns used in the tutorial. It shows time, air temperature, humidity, net radiation, wind speed and soil heat flux.
head(caldern[, c(
"datetime", "Ta_2m", "Huma_2m", "rad_net",
"Windspeed_2m", "heatflux_soil"
)])
#> datetime Ta_2m Huma_2m rad_net Windspeed_2m heatflux_soil
#> 1 2017-06-30 00:00:00 13.09 100.0 -15.200 0.448 1.551533
#> 2 2017-06-30 00:05:00 13.01 100.0 -8.920 0.380 1.492695
#> 3 2017-06-30 00:10:00 13.02 100.0 -1.965 0.548 1.448708
#> 4 2017-06-30 00:15:00 13.16 100.0 -1.790 0.581 1.390439
#> 5 2017-06-30 00:20:00 13.27 100.0 -2.469 0.764 1.325316
#> 6 2017-06-30 00:25:00 13.69 98.1 -3.857 0.589 1.268762The overview table summarizes the dataset without printing all rows.
| Item | Value |
|---|---|
| Rows | 288 |
| First timestamp | 2017-06-30 00:00:00 CEST |
| Last timestamp | 2017-06-30 23:55:00 CEST |
| Median timestep | 300 seconds |
| Number of variables | 19 |
| Variables with missing values | 6 |
The row count is still the expected 288 records for a five-minute day. The inspection problem is therefore not a missing file or an incomplete day. It is that specific variables contain gaps or suspicious values inside an otherwise regular teaching example.
Build the weather_station object
build_weather_station() is only a container step. It
stores station variables under names expected by fieldClim
functions. It does not check physics, repair missing values or calculate
replacement fields.
ws <- build_weather_station(
datetime = caldern$datetime,
temp = caldern$Ta_2m,
rh = caldern$Huma_2m,
t1 = caldern$Ta_2m,
t2 = caldern$Ta_10m,
hum1 = caldern$Huma_2m,
hum2 = caldern$Huma_10m,
v1 = caldern$Windspeed_2m,
v2 = caldern$Windspeed_10m,
rad_bal = caldern$rad_net,
soil_flux = caldern$heatflux_soil,
lat = 50.8405,
lon = 8.6832,
elev = 270,
z1 = 2,
z2 = 10,
surface_type = "field"
)The compact object structure shows that ws is a named
weather-station object. The vector lengths identify which fields are
time series and which are station metadata.
names(ws)
#> [1] "datetime" "temp" "rh" "t1" "t2"
#> [6] "hum1" "hum2" "v1" "v2" "rad_bal"
#> [11] "soil_flux" "lat" "lon" "elev" "z1"
#> [16] "z2" "surface_type"
sapply(ws, length)
#> datetime temp rh t1 t2 hum1
#> 288 288 288 288 288 288
#> hum2 v1 v2 rad_bal soil_flux lat
#> 288 288 288 288 288 1
#> lon elev z1 z2 surface_type
#> 1 1 1 1 1The important names for later checks are temp,
rh, rad_bal, soil_flux,
v1, v2, t1, t2,
hum1 and hum2. Those are the names downstream
functions use, regardless of the original logger column names.
Run the inspection
Now run the inspection function. It returns a structured report, not modified station data.
inspection <- inspect_weather_station_inputs(ws)The first compact original output is simply the object structure: the returned list contains field-level status, gap blocks, method readiness, quality-control flags, guidance text and a summary.
names(inspection)
#> [1] "fields" "gaps" "method_readiness" "qc_flags"
#> [5] "guidance" "summary"
sapply(inspection, function(x) {
if (is.data.frame(x)) {
paste(nrow(x), "rows x", ncol(x), "columns")
} else {
paste(class(x), collapse = ", ")
}
})
#> fields gaps method_readiness
#> "42 rows x 10 columns" "7 rows x 9 columns" "6 rows x 7 columns"
#> qc_flags guidance summary
#> "6 rows x 5 columns" "5 rows x 2 columns" "list"Read these components as follows. fields has one row per
expected station field. gaps reports consecutive missing
blocks, not just total NA counts. qc_flags
identifies existing values that are suspicious or physically impossible.
method_readiness reports which heat-flux methods have their
required input fields.
Inspect variable-level missingness
This step asks: which station fields contain missing values, and which variable classes do those fields represent?
The compact original output below is a filtered view of
inspection$fields. It shows only fields that are present
and have at least one missing value.
subset(
inspection$fields,
present & n_missing > 0,
select = c(field, variable_type, group, n_missing, n_total, missing_fraction)
)
#> field variable_type group n_missing n_total missing_fraction
#> 5 rad_bal radiation radiation 12 288 0.041666667
#> 17 soil_flux soil heat flux soil 72 288 0.250000000
#> 25 rh humidity humidity 5 288 0.017361111
#> 26 hum1 humidity humidity 5 288 0.017361111
#> 33 temp temperature profiles 1 288 0.003472222
#> 34 t1 temperature profiles 1 288 0.003472222
#> 37 v1 wind speed profiles 12 288 0.041666667The same result is easier to read as a compact interpretation table.
| Field | Variable class | Group | Missing values | Missing fraction | First missing row | Largest gap (steps) | |
|---|---|---|---|---|---|---|---|
| 1 | hum1 | humidity | humidity | 5 | 1.7% | 40 | 5 |
| 3 | rh | humidity | humidity | 5 | 1.7% | 40 | 5 |
| 5 | t1 | temperature | profiles | 1 | 0.3% | 20 | 1 |
| 6 | temp | temperature | profiles | 1 | 0.3% | 20 | 1 |
| 7 | v1 | wind speed | profiles | 12 | 4.2% | 130 | 12 |
| 2 | rad_bal | radiation | radiation | 12 | 4.2% | 100 | 12 |
| 4 | soil_flux | soil heat flux | soil | 72 | 25.0% | 180 | 72 |
In this dataset, the affected fields are temp,
t1, rh, hum1,
rad_bal, v1 and soil_flux. The
most consequential fields for heat-flux calculations are
rad_bal and soil_flux, because together they
define available energy as Q* - B. Missing v1
affects aerodynamic and profile-related methods. Missing
hum1 affects humidity-gradient and Penman-type
calculations. Missing t1 affects profile methods and
Bulk-Residual estimates of H.
Inspect gap blocks
Total missing counts are not enough. This step asks whether the missing values are isolated or form continuous blocks.
The compact original output sorts the gap table by length and shows the most important gaps first.
inspection$gaps[order(-inspection$gaps$n_timesteps), ][1:10, ]
#> field variable_type gap_start_index gap_end_index n_timesteps
#> 2 soil_flux soil heat flux 180 251 72
#> 1 rad_bal radiation 100 111 12
#> 7 v1 wind speed 130 141 12
#> 3 rh humidity 40 44 5
#> 4 hum1 humidity 40 44 5
#> 5 temp temperature 20 20 1
#> 6 t1 temperature 20 20 1
#> NA <NA> <NA> NA NA NA
#> NA.1 <NA> <NA> NA NA NA
#> NA.2 <NA> <NA> NA NA NA
#> start_time end_time duration_seconds gap_class
#> 2 2017-06-30 14:55:00 2017-06-30 20:50:00 21600 long
#> 1 2017-06-30 08:15:00 2017-06-30 09:10:00 3600 medium
#> 7 2017-06-30 10:45:00 2017-06-30 11:40:00 3600 medium
#> 3 2017-06-30 03:15:00 2017-06-30 03:35:00 1500 medium
#> 4 2017-06-30 03:15:00 2017-06-30 03:35:00 1500 medium
#> 5 2017-06-30 01:35:00 2017-06-30 01:35:00 300 short
#> 6 2017-06-30 01:35:00 2017-06-30 01:35:00 300 short
#> NA <NA> <NA> NA <NA>
#> NA.1 <NA> <NA> NA <NA>
#> NA.2 <NA> <NA> NA <NA>The interpreted table keeps the same information but formats timestamps and duration for reading.
| Field | Variable class | Gap start | Gap end | Steps | Duration | Gap class | |
|---|---|---|---|---|---|---|---|
| 2 | soil_flux | soil heat flux | 14:55 | 20:50 | 72 | 6 h | long |
| 1 | rad_bal | radiation | 08:15 | 09:10 | 12 | 1 h | medium |
| 7 | v1 | wind speed | 10:45 | 11:40 | 12 | 1 h | medium |
| 4 | hum1 | humidity | 03:15 | 03:35 | 5 | 25 min | medium |
| 3 | rh | humidity | 03:15 | 03:35 | 5 | 25 min | medium |
| 6 | t1 | temperature | 01:35 | 01:35 | 1 | 5 min | short |
| 5 | temp | temperature | 01:35 | 01:35 | 1 | 5 min | short |
A single missing five-minute value is a row-level interruption. A
30-60 minute gap starts to affect subdaily interpretation. The
multi-hour soil_flux gap is more serious because it affects
available energy Q* - B. Wind gaps can affect
Bulk-Residual, Penman and profile-based methods. Humidity gaps affect
Bowen, Monin-Obukhov/Profile and Penman-type paths.
Entry matrix: variable type and gap length
The inspection table tells us where values are missing. The entry matrix explains why the consequence differs by variable type.
| Variable type | Fields in this example | What the inspection shows | Why this matters | What fieldClim does |
|---|---|---|---|---|
| Temperature |
temp, t1
|
An isolated missing air-temperature value. | Temperature enters profile gradients and helper calculations. | Reports this gap and affected methods. It does not repair the value. |
| Humidity |
rh, hum1
|
A short humidity gap and one invalid relative-humidity value. | Humidity affects Bowen, Monin-Obukhov/Profile and Penman-type inputs. | Reports the gap and QC flag. It does not correct humidity values. |
| Radiation | rad_bal |
A medium net-radiation gap and a suspicious shortwave value in the source data. | Radiation controls available energy for energy-balance methods. | Reports affected radiation fields. It does not substitute modeled radiation. |
| Wind speed | v1 |
A medium wind-speed gap and one negative wind-speed value. | Wind controls aerodynamic and profile-based methods. | Reports the gap and negative-wind flag. It does not invent wind speed. |
| Soil heat flux | soil_flux |
A long continuous soil heat-flux gap. | Soil heat flux is subtracted from net radiation in
A = Q* - B. |
Reports the long gap and affected energy-balance methods. It does not replace soil heat flux. |
This table is not a ranking of filling methods. It is a reading guide for the inspection output. The key research-based point is that variable type, gap length and QC status determine the next external decision more than any universal method ranking.
Quality-control flags
Missing values are not the only problem. Some values are present but should not be accepted without review.
The compact original output below is the actual qc_flags
table returned by inspect_weather_station_inputs().
inspection$qc_flags
#> field row_index flag severity
#> 1 rh 60 humidity_outside_0_100 error
#> 2 hum1 60 humidity_outside_0_100 error
#> 3 v1 70 negative_wind_speed error
#> 4 datetime 80 duplicated_timestamp warning
#> 5 datetime NA irregular_timestep warning
#> 6 soil_flux 180 long_gap warning
#> message
#> 1 Relative humidity should be within 0..100 percent before downstream use.
#> 2 Relative humidity should be within 0..100 percent before downstream use.
#> 3 Wind speed should not be negative.
#> 4 Duplicated timestamps can invalidate gap-length and workflow interpretation.
#> 5 Datetime spacing is irregular; inspect timebase before external gap treatment.
#> 6 Long missing-data run; variable type and gap length should guide any external treatment.The formatted table adds timestamps and values where they are available.
| Field | Timestamp | Value | Flag type | Explanation |
|---|---|---|---|---|
| rh | 2017-06-30 04:55 | 105.0 | humidity_outside_0_100 | Relative humidity should be within 0..100 percent before downstream use. |
| hum1 | 2017-06-30 04:55 | 105.0 | humidity_outside_0_100 | Relative humidity should be within 0..100 percent before downstream use. |
| v1 | 2017-06-30 05:45 | -0.5 | negative_wind_speed | Wind speed should not be negative. |
| datetime | 2017-06-30 06:30 | NA | duplicated_timestamp | Duplicated timestamps can invalidate gap-length and workflow interpretation. |
| datetime | NA | NA | irregular_timestep | Datetime spacing is irregular; inspect timebase before external gap treatment. |
| soil_flux | 2017-06-30 14:55 | NA | long_gap | Long missing-data run; variable type and gap length should guide any external treatment. |
In this dataset, relative humidity above 100 percent is invalid. Negative wind speed is invalid. Duplicated or irregular timestamps affect gap-length interpretation. A suspicious radiation value should be checked against time of day and expected physical range. The value is still present in the data; the inspection warns that it should not be accepted without review.
The synthetic source data also include two intentionally obvious spike examples that are useful for manual review.
| Field | Timestamp | Value | Interpretation |
|---|---|---|---|
| rad_sw_in | 2017-06-30 15:45 | 5000 | Check shortwave radiation against time of day and expected range. |
| Ta_2m | 2017-06-30 16:35 | 45 | Check whether the temperature spike is a sensor artefact. |
Method readiness
This step connects the inspection to downstream heat-flux workflows. It asks which methods have their required fields and which required fields contain missing values.
The compact original output keeps the relevant readiness columns visible.
inspection$method_readiness[, c(
"method", "missing_fields", "partial_fields", "ready"
)]
#> method missing_fields partial_fields
#> 1 priestley_taylor temp, rad_bal, soil_flux
#> 2 bulk_residual t1, v1, rad_bal, soil_flux
#> 3 bulk_residual_ri_guard t1, v1, rad_bal, soil_flux
#> 4 bowen t1, hum1, rad_bal, soil_flux
#> 5 monin_profile t1, hum1, v1
#> 6 penman obs_height v1, temp, rad_bal, soil_flux, hum1
#> ready
#> 1 TRUE
#> 2 TRUE
#> 3 TRUE
#> 4 TRUE
#> 5 TRUE
#> 6 FALSEThe interpreted method table translates that result into method-level consequences for this dataset.
| Method | Required field groups | Structurally available? | Fields with gaps | What this means for this dataset |
|---|---|---|---|---|
| Bowen-ratio | temperature and humidity gradients, heights,
rad_bal, soil_flux
|
Yes | t1, hum1, rad_bal, soil_flux | If rad_bal, soil_flux or
temp is missing at a timestep, A = Q* - B or
temperature input is unavailable. |
| Bulk-Residual | temperature difference, wind, heights,
rad_bal, soil_flux
|
Yes | t1, v1, rad_bal, soil_flux | Gaps in t1, v1,
rad_bal or soil_flux affect
H_bulk or residual LE at those rows. |
| Bulk-Residual with Richardson guard | Bulk-Residual inputs plus two wind heights | Yes | t1, v1, rad_bal, soil_flux | Two wind heights exist, but the same wind and energy-input gaps still matter row by row. |
| Monin-Obukhov/Profile | temperature, humidity and wind profiles plus site metadata | Yes | t1, hum1, v1 | Missing or invalid humidity affects the gradient ratio; energy-input gaps also affect partitioning. |
Penman-type LE
|
radiation, soil heat flux, temperature, humidity, wind and site metadata | No | v1, temp, rad_bal, soil_flux, hum1 | Missing or invalid profile values directly affect the diagnostic profile calculation. |
| Priestley-Taylor |
rad_bal, soil_flux,
temp, surface_type
|
Yes | temp, rad_bal, soil_flux | Penman uses radiation, soil heat flux, temperature, humidity, wind and site metadata; it returns latent heat only. |
Priestley-Taylor uses rad_bal, soil_flux,
temp and surface_type. If rad_bal
or soil_flux is missing at a timestep,
A = Q* - B is unavailable.
Bulk-Residual uses temperature difference, wind and heights for
H_bulk, plus rad_bal and
soil_flux for residual LE. With the optional
Richardson guard, two wind heights are needed.
Bowen uses temperature and humidity gradients. Missing or invalid humidity affects the gradient ratio.
Monin-Obukhov/Profile uses temperature, humidity and wind profiles. Missing or invalid profile values directly affect the profile calculation.
Penman uses radiation, soil heat flux, temperature, humidity, wind
and site metadata. It returns LE only.
External continuation boundary
If gaps or suspicious values are found, the next step depends on the variable type, gap length and analysis goal. A short temperature gap, a radiation gap during changing cloud conditions and a missing wind profile do not have the same meaning for later heat-flux calculations. This is why the inspection first reports variable classes, gap blocks, QC flags and method readiness instead of ranking algorithms.
| Package | Main_focus | Strength_for_this_task | Limitation_for_this_task | Best_fit_in_this_vignette |
|---|---|---|---|---|
| climatol | Climatological station series: quality control, homogenization, missing-data workflows and derived climate products. | Most relevant when a station series is longer than the one-day example and the problem is not only a short local gap, but consistency of a climatological record. It can support QC, homogenization and documented reconstruction of standard climate variables. | Not designed as an automatic row-level repair step inside a heat-flux calculation. Its assumptions, homogenization choices and reconstructed values would have to be documented before re-importing data into fieldClim. | Longer temperature, humidity, radiation or other climate-station series after fieldClim has shown where the gaps and QC problems are. |
| dataresqc | Quality control and formatting of historical daily and sub-daily climate observations. | Most relevant before any later reconstruction step when the main question is whether the observed series is technically and physically trustworthy. It is useful for systematic QC, formatting and flagging of daily or sub-daily climate observations. | Primarily a QC and data-rescue tool, not a heat-flux or microclimate modelling package. It helps decide whether observations are trustworthy; it does not make fieldClim methods run on missing inputs. | Checking whether suspicious values such as impossible humidity, negative wind speed or inconsistent time structure should be flagged before any further processing. |
| meteo | Spatial and spatio-temporal prediction for meteorological and environmental station variables. | Most relevant when local inspection shows that gaps cannot be interpreted from the target station alone. It can use neighbouring stations, coordinates, time and covariates for spatial or spatio-temporal prediction. | Requires a spatial prediction setup with stations, coordinates and validation. It is not a replacement for measured radiation, wind or soil-flux inputs unless that external modelling decision is explicitly justified. | Situations where fieldClim inspection shows that a variable is missing for too long to interpret locally and neighbouring stations or spatial covariates are available. |
Summary
This tutorial demonstrated how
inspect_weather_station_inputs() can be used before running
fieldClim heat-flux methods. The function returns a
structured inspection report with variable-level availability,
missing-value runs, quality-control flags and method-readiness
information.
The example shows why missing data must be interpreted by variable type and gap length. A short temperature gap, a radiation gap, a missing wind profile and a long soil-flux gap do not have the same consequences for later calculations. The inspection output therefore helps identify which variables require review and which method families are affected. The method-selection page then determines which calculation path can be interpreted from the inspected measurement architecture.
fieldClim does not fill, impute, interpolate, complete
or replace missing values. It reports the problem. Any decision to
repair or reconstruct data must be made outside fieldClim,
documented separately, and followed by a new inspection before heat-flux
calculations are interpreted.