5 Best Practices for Gridding Concentration Data
The process of converting randomly spaced data into a uniformly spaced grid or surface has evolved drastically from the early days of map making. One of the biggest changes has been the ability to use computer based data interpolation (gridding) algorithms to quickly convert thousands of points into a surface grid or raster in just a few seconds. This process saves scientists hours formerly spent manually interpolating their data to create contour maps and other visuals.
As with any technology, the accuracy of the results is contingent on the input parameters. When gridding, the accuracy is dependent upon an understanding of the data, gridding algorithm, and the available software settings.
Have you ever wondered how to use the right gridding algorithm and settings for your data? The answer to this question depends on a number of factors, one of which is the type of data you’re working with. Over the next several weeks we will be examining how different algorithms and settings impact the interpolation of three common data types: concentration data, survey line data, and drillhole data.
Understanding your Concentration Data
The first step of data interpolation and interpretation is to gain an understanding the unique properties of your data and the challenges those properties may present. In the case of chemical concentrations in soil or water, there are several unique properties that must be considered. These properties include:
- Testing limits
Most testing protocols have a lower and upper limit of detection. This means that instead of zero values, you will likely have a collection of non-detectable or contingent values in your dataset. - Fixed data range
Concentration data cannot be negative and may also have an upper saturation limit that must be considered. By default, gridding algorithms do not impose these limits and they must therefore be imposed manually. - Data spikes
It is quite common for high concentrations of a chemical or mineral to be surrounded by much lower concentrations as a plume develops. This spike in values can have an outsized impact on the surrounding interpolated data and lead to inaccurate results. - Limited data collection
Groundwater monitoring wells can be sparse, particularly in rural areas with limited public infrastructure. In addition, the expense associated with testing can become a barrier to extensive data collection.
Surfer post map illustrating a typical dataset of chemical concentrations
Which of these properties are a factor in your latest project? If any of these properties are present in your datasets, read on for recommendations on how to ensure accurate results.
Concentration Data Best Practices
Now that we understand the unique challenges concentration data poses to data interpolation, we can use those to inform our next steps. When gridding concentration data, these five best practices will ensure you get the most accurate results.
1. Review & adjust input data
Are there any obvious typos? Do the statistics align with your expectations? Before you work with any data it is good practice to perform a quality check.
When working with concentration data you must also consider the impact of non-numeric data. The lower and upper testing limits associated with soil and water testing commonly result in non-numeric values such as ND (not detectable) or <0.1.
To ensure that these values are taken into account during data interpolation, these text values must be replaced with numeric values. We recommend that a non-zero value such as the detection limit be used. This same value will be used to assign the grid Z minimum later to ensure no artificial variance is generated near the lower limit.
Example of the recommended data adjustments prior to gridding
2. Select the gridding algorithm
The algorithm used to interpolate the data can have a large impact on the results. One of the most common, and popular, gridding algorithms is Kriging. The Kriging gridding method is a very flexible algorithm that attempts to express trends in data, rather than bullseyes, and is known for producing visually appealing results.
There are many other gridding methods available and the best gridding algorithm is often determined by the density and dispersion of the input data. The article Choosing the right gridding method in Surfer offers several factors to consider along with recommendations based on those factors. Also included in this article is information about running the GridData_Comparison.bas sample script which creates a grid and contour map of your data using eight of the most common gridding algorithms.
The GridData_Comparison.bas script shows how the 8 most common gridding methods interpolate your data using the default settings.
3. Adjust the resolution
Grid resolution is very similar to image resolution. The number of nodes or pixels in each direction will determine how well smaller features are expressed in the results. Also similar to images, low, medium and high resolution grid files will have approximately 100, 500, and 1000+ grid nodes in the longest direction as a general rule of thumb.
To determine the best resolution for your grid file consider the accuracy required for meaningful interpretation of the results. If every measurement must be represented, the spacing between each grid node must be less than or equal to the distance between the two closest data points. If the exact representation of dense data pockets will not impact the final interpretation, then the spacing should be less than or equal to the average data spacing.
4. Set Z limits
The Z limits for data interpolation are used to account for the data and testing limits of the chemical compound being modeled.
The Z minimum should be set to zero if no data manipulation was required to account for non-detectable testing results. The Z minimum should be set to the data minimum if the data was adjusted to account for the lower testing limit.
The Z maximum should be set to the upper testing or saturation limit. If your data does not approach this value or one is not known, the Z maximum can be left alone.
5. Transform the Z values
It is quite common for concentration data to contain spikes where a high value above 1000 can be located next to a low or near zero value. Because data interpolation algorithms use all data in a defined radius to estimate the value at a specific point, these spikes can lead to a positive bias for all surrounding values.
The “Log, save as Linear” Z transform reduces the impact of data spikes by condensing all of the input data to a smaller range before interpolation. It works by taking the log base 10 of the input Z values, performing the interpolation, and then applying the antilog to convert the results back to the original linear values.
One important note is that this setting will ignore zero values. If your data contains true zero values and data spikes, you will need to use your knowledge of the area and chemical properties to determine whether adjusting the input data values or applying the transform produces more accurate results.
Left: Contour map illustrating the Kriging results using the default settings
Right: Contour map illustrating the Kriging results after applying these best practices
Data interpolation is a powerful tool for quickly and accurately converting unevenly spaced data into a uniform matrix or grid. Gridding concentration data comes with some unique challenges that can be overcome by using these five best practices:
- Adjust the input data so that all values are considered during interpolation.
- Select the right gridding method for your data dispersion and density.
- Adjust the resolution to align with your data density and accuracy requirements
- Limit the Z range of the output grid
- Apply a Z transform to avoid positive bias around spikes in the data
Download this Surfer script to see these best practices in action with your data!
Not a Surfer user? Download the free trial to see what it can do for you.