Virginia Tech® home

Prediction Equations: Intuition and Implementation in Forestry

ID

CNRE-184P

Authors as Published

Authored by Corey Green, Assistant Professor, Forest Resources and Environmental Conservation, Virginia Tech

EXPERT REVIEWED

Consider the following scenario: A forester is tasked with determining the value of the standing timber in a 75-acre, 25-year-old planted loblolly pine stand. Depending on the region, timber may be sold by the cubic foot, green or dry ton, board foot (the equivalent volume of a 1-inch-thick piece of wood, 12 inches square), or even by carbon content. The forester has simple tools for measuring the diameter at breast height (diameter at 4.5 feet above the ground, commonly referred to as DBH) and stem height (total height from ground to tip, or THT). How would the forester go about this? Cutting the trees down is not an option, so some way of predicting the standing content is needed.

Another scenario: A forester needs an estimate of the average height of trees in a given forest. Measuring height is not difficult, but it is more time consuming than measuring DBH. The forester decides to measure DBH on all trees but heights on just a subset. The forester now needs a way to determine the heights of the trees not measured.

Both of the scenarios above represent situations where something relatively difficult to measure is needed (e.g., tree volume) and only simple, easy measurements are available (e.g., DBH).

The objective of this article is to describe and demonstrate a powerful statistical tool called linear regression. This technique can be used to predict tree characteristics that are difficult to measure but important for forest management. Yellow poplar (Liriodendron tulipifera) data from Burkhart, Avery, and Bullock (2019) will be used to illustrate this statistical tool.

In figure 1 below, a scatterplot is used to show the relationship between two characteristics. Each open circle represents an x-y pair. The easier-to-measure characteristics for an individual are located on the x-axis (horizontal). The y-axis (vertical) represents the harder- to-measure characteristics for those same individuals. The red arrows refer to a single individual’s x and y values.



The figure is a scatter plot with a generic “easy-to-measure characteristic” on the x-axis and “harder-to-measure characteristic” on the y-axis. Plot points are located from lower left to upper right, with two red arrows drawn from one point: One arrow points left to where it intersects with the y-axis; the other points down to where it intersects with the x-axis.
Figure 1. The relationship between two characteristics for individuals.

Looking at this relationship, it should be apparent that the two characteristics are strongly related. If you know x (the easier thing to measure), you should be able to predict y (the harder thing to measure). Unfortunately, this isn’t always the case. Compare the two scatterplots in figure 2.

The figure shows two scatter plots, both with a generic “easy-to-measure characteristic” located on the x-axis and a “harder-to-measure characteristic” on the y-axis. The graph on the left is labeled “stronger relationship” and has plot points that form a nearly straight line from bottom left to upper right. The graph on the right also has plot points from bottom left to upper right, but they are more scattered. This graph is labeled “weaker relationship.”
Figure 2. A comparison of two relationships.

Clearly, the left panel of figure 2 is a stronger relationship. You would be less confident using the relationship in the right panel to predict y from x.

How can we use these relationships? Return to the first scenario: You need to know the total cubic foot volume in a standing tree. One option would be to fell the tree and cut it into lots of small pieces. Each piece could then be submerged in water, and the displacement could be measured. This would result in a highly accurate measure of volume; however, this is highly impractical, especially if the tree needs to remain standing.

An alternative approach commonly used by foresters is to use an equation that can predict volume. After going through the challenging work of felling lots of trees and actually measuring the volume, simple-to-measure characteristics (e.g., DBH and THT) can be evaluated for their ability to predict volume. Research has shown that one of the most useful characteristics is called the “combined variable.” This is simply DBH2 × THT for an individual tree (note: the units are not converted). Foresters then use a technique called linear regression (usually performed with software) to find the statistically “best” straight line through the points. The form of this line is y = mx + b, where y is the volume, m is the slope of the line, x is the combined variable (DBH2 × THT), and b is the point at which the line intersects the y-axis. Figure 3 shows the best fitting straight line using the combined variable to predict tree volume with the following formula: pred.vol=b + m × (DBH2 × THT).

In this example, the optimal slope (m) was found to be 0.0022, and the optimal y-intercept (b) was found to be 1.4455.

This scatter plot has DBH2 x THT located on the x-axis, with values from 0 to 80,000. The y-axis represents cubic foot volume, ranging from 0 to 150. A diagonal blue line is drawn through the plot points, which are located from lower left to upper right. Two red arrows are drawn from one plot point: One arrow points left to the y-axis at 85.7 cubic feet; the second arrow points down to the x-axis at little less than 40,000 and says “DBH = 19 inches, THT = 105 feet. DBH = diameter at breast height, THT = total height
Figure 3. The relationship between the combined variable DBH2 × THT and total cubic foot volume where DBH = diameter at breast height and THT = total height.

Use and Implications

So how is this line useful? An example is shown in figure 3. Assume you measure the DBH of a tree to be 19 inches and the total height to be 105 feet. The combined variable is 192 × 105 = 37,905. Using the regression equation, the predicted volume for the tree would be 84.84 ft3 = 1.4455 + 0.0022 × 37,905. Using two relatively easy-to-measure characteristics (DBH and THT), an extremely difficult value to measure (total volume) was predicted.

A natural question is how good is this prediction? This is an important question to ask because a statistically best line can always be produced, even with weak relationships. Compare the two graphs in figure 4.

 Figure 4 has two scatter plots. The plot on the left is labeled “stronger relationship” and says the predicted volume = 1.4455 + 0.0022 (DBH2 x THT). The x-axis represents DBH2 x THT, with values from 0 to 80,000. The y-axis is cubic foot volume, and values range from 0 to 200. Plot points are located from lower left to upper right on the graph, and a blue diagonal line connects them. The right-side scatter plot is described as a “weaker relationship,” and the predicted volume equation is -184.9443 + 2.6545 (THT). The x-axis is labeled THT (feet), and its values range from 60 to 120. The y-axis represents cubic foot volume and ranges from 0 to 200. Plot points are scattered around a diagonal blue line from bottom left to upper right, but they are not as close to the line as the points in the left-side graph. DBH = diameter at breast height and THT = total height
Figure 4. Comparison of two prediction equations for total tree volume where DBH = diameter at breast height and THT = total height.

By looking at the spread of points around the line, it’s clear that the line in the left panel of figure 4 is a better fit to the data.

Software can calculate two common values that can be used to determine how good the prediction equation is. The first is called “R-squared.” This essentially tells us what percentage of the variation in the y-characteristic the regression line explains. The value ranges from 0 to 1, with 1 being perfect and 0 indicating no relationship.

Another value used is the standard error of the estimate. This is a measure of a typical deviation from the regression line. In other words, what is a typical deviation above or below the regression line of measured volume compared to the value predicted by the line? This can be any value greater than or equal to zero and is expressed in the same units as the value being predicted. Smaller values are preferred (i.e., a smaller typical deviation is better). Table 1 shows the statistics for the two different regression equations.

Table 1. Fit statistics for the two predictor variables considered where DBH = diameter at breast height and THT= total height.

Predictor (x) used

R-squared

Standard error (ft3)

DBH2 x THT

99%

4.84

THT

82%

21.92

The statistics confirm our visual interpretation that using the combined variable is preferred. The R-squared value is closer to 1, and the standard error is much lower.

When using regression equations, there are a few important risks to be aware of. First, be careful when using the equation to predict far outside the range of data it was produced with (e.g., DBH, THT, species, location). This is called “extrapolation” and can lead to nonsensical results. Consider the equation if using only height to predict volume. For smaller trees, the predicted volume will actually end up negative! This equation can safely be used for the range of data it was produced from, but extrapolation led to major issues.

Another important point to consider is if a straight line is appropriate. Consider figure 5.

A scatter plot with a heading that asks, “Is a straight line appropriate?” The x-axis is labeled DBH (inches), and values range from 0 to 25. The y-axis represents cubic foot volume from 0 to 150. Plot points travel from lower left to upper right, as does a diagonal blue line. A curved red line seems to better connect the plot points. DBH = diameter at breast height.
Figure 5. Using DBH alone to predict total tree volume results in a relationship that is not best described by a straight line (the blue line) where DBH = diameter at breast height.

In figure 5, it’s clear that the relationship between DBH and cubic foot volume is best described by using a curved line rather than a straight line. The red curve is produced using a more advanced statistical technique, but it illustrates an important point: When looking at the data, study it carefully to determine if a straight line is appropriate. If not, either proceed with caution or consider a different statistical technique.

Finally, these models assume the predictor variables (DBH2 × THT) are measured without any error. The approach described has no ability to help correct for this. High-quality data are important, and procedures should be implemented to check data prior to use.

Once produced, regression equations are simple to use and can easily be programmed into a calculator or spreadsheet. Volume tables can be easily produced. For example, the following table (table 2) was produced for yellow poplar using the combined variable equation in figure 4. The advantage of the equation is the volume can be predicted for any size tree. A table for all possible DBH-THT combinations would be impractical and hard to use.

There are many extensions to the linear regression technique outlined. Multiple x-values can be used to produce equations with a method called “multiple regression.” It is difficult to visualize, but using the equation is very similar. For example, DBH and THT could be used to produce the following equation: pred.vol = -75.8235 + (7.8355 × DBH) + (0.2047 × THT).

Notice they are not in the combined variable form. To use this prediction equation, plug in the DBH and THT measurements and solve:

94.54 ft3 = -75.8235 + (7.8355 × 19) + (0.2047 × 105).

Note: This is quite different than the combined variable answer. They are not expected to be the same, and in this case, the combined variable form is statistically better (R-squared = 99% vs. 94%, and standard error = 4.84 ft3 vs. 12.53 ft3).

The red curve in figure 5 was produced using a more advanced statistical procedure called non-linear regression; however, using the equation is similar: pred.vol = 4.0056 – 2.1161 × DBH1.3505 + 0.9941 × DBH1.7944.

87.05 ft3 = 4.0056 – 2.1161 × 191.3505 + 0.9941×191.7944.

In summary, the regression technique is a highly useful method to predict characteristics that are difficult to measure. Regression has many applications throughout forestry, and software can easily produce and implement equations. Many scientific studies have developed highly accurate and useful predictive equations for many tree characteristics. This leads to more realistic estimates of, for example, timber value, carbon storage, and wildlife habitat. However, before using any regression equation, be sure to examine the range of data used to produce it to guard against illogical predictions due to extrapolation. Also, review any fit statistics (e.g., R-squared and standard error) to determine if the regression equation fits the data well, so you have confidence in the predictions.

Table 2. Volume table for yellow poplar produced with the prediction equation: pred.vol = 1.4455 + 0.0022 × (DBH2 × THT) where DBH = diameter at breast height and THT = total height.

Table 2. Volume table for yellow poplar produced with the prediction equation

References

Burkhart, Harold E., Thomas E. Avery, and Bronson P. Bullock. 2019. Forest Measurements, 6th ed. Long Grove, IL: Waveland Press.


Virginia Cooperative Extension materials are available for public use, reprint, or citation without further permission, provided the use includes credit to the author and to Virginia Cooperative Extension, Virginia Tech, and Virginia State University.

Virginia Cooperative Extension is a partnership of Virginia Tech, Virginia State University, the U.S. Department of Agriculture, and local governments. Its programs and employment are open to all, regardless of age, color, disability, sex (including pregnancy), gender, gender identity, gender expression, genetic information, ethnicity or national origin, political affiliation, race, religion, sexual orientation, or military status, or any other basis protected by law.

Publication Date

November 12, 2024