AMASS BALANCE ACCOUNTING PROCEDURE FOR ESTIMATING CONTRIBUTIONS TO WATER QUALITY

A computerized procedure allows the use of existing State water quality data, collected in a fixed, diffuse grab sample network, to estimate the approximate relative contributions to water quality attributable to defined and undefined upstream inputs. The routine determines an empirical assimilation rate for any water quality constituent whose assimilation can be modeled, as a rough approximation, using first-order die away. After estimating the relative contributions of all defined contributors, mostly NPDES permit holders, the magnitude of the total undefined contribution, largely nonpoint, is estimated by difference. Because the system is constrained to operate using only existing water quality monitoring data, the results are admittedly of low resolution. Nonetheless, for synoptic analyses of water quality at the regional level, the system provides highly usable management information.


INTRODUCTION
As part of its efforts to better define the impact of various pollution sources on the water quality of Illinois streams, the Illinois Environmental Protection . Agency has charged us with developing a procedure for estimating the contributions of individual sources to stream quality. Implicit in the charge is that the procedure operate on existing or readily obtainable data, that it be inexpensive to operate and apply, that it be easily learned and understood, and that it be applicable both to a wide range of basins and to a broad variety of regulated substances. The first of these criteria limited us to procedures which would work either with existing water quality data or data which will be collected as part of standard sampling efforts in the future. Our experience indicated that to ensure the accuracy of the results a further limitation was needed, namely, that the procedure take into account the assimilative capacities of streams.
Given these limitations, we have developed the simple mass-balance accounting procedure presented here. Since limitations similar to ours exist in other States and regions where it is desired to estimate the contribution of individual sources to water quality, we believe that the procedure may be used where no previous procedure has proven suitable.
The authors are well aware that the method is imprecise. No interactions among constituents are considered, for example. Nonetheless, the method is justified in a number of ways. Firstly, the data we are forced to work with simply do not support a more theoretical model. Secondly, because this is an analytical tool to be applied repeatedly in looking synoptically at regional water quality problems, the fact that it is easy to implement, and quick and inexpensive to use, makes it a highly desirable management option. Thirdly, the results obtained in a variety of applications in Illinois are consistent with results from other more theoretical and data-dependent techniques costing substantially more.
In developing the procedure, we used as an example a segment of the Sangamon River in east-central Illinois, from its head waters to a point downstream just above the town of Monticello. All examples and conclusions in this presentation are drawn from that example.

System Entities
The procedure distinguishes six different kinds of water quality entities: monitoring stations, defined point-source inputs, defined point diversions, defined nonpoint-source inputs (including both anthropogenic and natural nonpoint sources), undefined area contributions, and sources upstream from the segment being analyzed. A defined contributor is one whose location is known and for which there are records of constituent loading or of discharge and concentration. Specifically, a defined point source is normally one for which discharge rate, pollutant concentration, and stream location are known. A defined point diversion is one for which the amount of withdrawal and the stream location are known. A defined nonpoint-source input is one for which a load, or discharge rate and pollutant concentration, are known, along with stream or area location. The undefined area contribution is the part of the load that is introduced within a subsegment that cannot be attributed to any defined sources. It can consist of the natural background or of undefined point and nonpoint sources. An upstream source derives from any part of the river upstream from the segment being analyzed, though its actual source is unknown, and occurs when at least one upstream delimiter for the segment is a monitoring station.
Preparing the Hierarchical Representation of the Segment The computerized system used for the mass-balance accounting requires that the data be properly organized. Figure 1 is a schematic representation of the upper Sangamon River segment. The direction of flow is from top to bottom. Table 1 presents the proper hierarchical representation of the same segment with all water quality contributions to the segment enumerated, at least to the extent that they are known. The "segment" may be any continuous length of a river bounded downstream by a single water quality monitoring station and upstream by any combination of extreme headwaters or monitoring stations. The segment is subdivided by internal monitoring stations into subsegments, each of which is considered individually in the analysis. Each subsegment is likewise bounded downstream by a single monitoring station and upstream by any combination of headwaters or monitoring stations. Monitoring station E18, for example, is the downstream delimiter for the total segment as well as for the subsegment labelled /E18/. The principal contributors to subsegment /E18/ follow in the hierarchy and are inset one level to the right: (a) an undefined area input identified only as /El8/ and considered to be coming from a distance equal to the mean instream travel distance within the subsegment to the downstream monitoring station; (b) the Deland-Weldon High School, a point source; (c) the Deland-Weldon Grade School, another point source; and (d) monitoring station E08, which is considered a point source for this subsegment but a delimiter for the next upstream subsegment, which, in turn, has its own hierarchy of upstream inputs. The first entry in the hierarchy for each subsegment is the downstream delimiting station, followed by an undefined area contribution and then by all defined contributions and diversions including upstream monitoring stations. The indentation of each entity in the segment hierarchy is important in the computerized analysis and will be further discussed in a later section. Table 2 illustrates the physical information required for each of the entities in a typical subsegment. Each entity is identified by a brief alphanumeric designator (see Table 2).

Physical Data Requirements
The position number indicates the hierarchical relationships among entities in the segment. The downstream delimiting station is, by definition, in position "0." It collects and integrates all water quality contributions from entities in position 1. The last water quality station in position 1 collects all water quality contributions in position 2 until another position 1 is encountered, etc. Note, for example, that the water quality station E20 collects only a single position 4 element.
The type column indicates which of the six entity types each element is. Type assignments are "0" for a monitoring station, " 1 " for a defined point source, " 2 " for a defined nonpoint source, " 3 " for an undefined subsegment source, and "4" for a point diversion. The upper Sangamon River has no defined nonpoint sources or point diversions at the time of this writing.
Distance measurements are given as stream distances in miles to the next downstream monitoring station. Thus, it is 7.4 miles from station E08 to station El 8. For any nonpoint source, whether defined or undefined, the distance given is an estimated mean streamflow distance from the contributing area to the next downstream monitoring station.
The area value for monitoring stations is the total upstream drainage area in square miles. For defined point sources and point diversions, an area of zero is assigned. For defined nonpoint sources, the drainage area is the contributing area. For undefined area sources, the area value used is the drainage area of that subsegment, based on the assumption that undefined contributions are proportional to the contributing area.
In this example, data were analyzed on an annual mean basis. Thus, for all stations, the discharge rate used was the annual mean discharge in cubic feet per second. Since the only streamflow gaging station is at location E19, flow data for other monitoring stations were derived by apportioning the total flow on a drainage area basis after correcting for point-source discharges. For point sources or diversions, annual mean flow values, either as actually measured or as design estimates, were used. For defined nonpoint sources (although none have been defined in the upper Sangamon), the discharge value used would be an estimate of the diffuse inflow from the contributing area. Undefined area flow is the increment to flow in the subsegment not attributable to defined sources.
It is important to note that the procedure may utilize data other than annual means. For example, seasonal means or daily means could be used. Thus, one may investigate contributions during summer months or during periods when sample records indicate extreme excursions beyond stream standards. Of the physical data required, only the discharge data will change if the time basis is changed.
Gathering these data for any individual stream segment normally requires only a decent map of the watershed, a bit of planimetry, and discharge records from stream and point-source gaging. Once gathered, the data, except for the discharge rates, can be stored for future use in subsequent analyses involving any of the subsegments regardless of the constituents being studied.

Chemical Data Requirements
The only additional items of information needed to operate the system are discharge-weighted concentration values for the constituents being analyzed, the number of samples in the record, and the coefficients of variation for the sample analyses at each sampling station. If discharge-weighted concentration values are not available, an unweighted mean of all samples will provide useful estimates. The chemical data for the upper Sangamon River are listed in Table 3. The number of samples and the coefficients of variation do not figure into the calculations of the relative contributions in any way. Their purpose is to remind those reading and interpreting the analysis that the data may or may not be properly representative of the system. Obviously, an analysis based on sparse, highly variable data must be interpreted with extreme caution.

Computing Relative Contributions of Sources
Using the foregoing information and data, the procedure can estimate the relative and absolute contributions of all water quality sources to downstream loads. Obviously, if all constituents were completely conservative, then simple mass-balance computations would provide the relative values sought. Such is rarely the case, however, and instream assimilation must be considered.
Fortunately, to the extent that the die away curve for a constituent can be assumed to follow first-order dynamics as a function of stream flow distance, instream assimilation coefficients can be estimated from existing ambient water quality monitoring data. The data for each pair of subsegments in the system being analyzed can be incorporated into two simultaneous equations: M(l)=I( 1,1 )exp(ad( 1,1 ))+I( 1,2)exp(ad( 1,2))+-+1(1 ,u)exp(ad( 1 ,u)) [ 1 ] and M(2)=I(2,l)exp(ad(2,l))+I(2,2)exp(ad(2,2))+-+I(2,u)exp(ad(2,u)) [2] where: = mass of constituent arriving at downstream monitoring station delimiting subsegment 1 1(2,1) = mass of first defined input within subsegment 2 I(2,u) = mass of undefined area input from within subsegment 2 exp(x) = base of the natural logarithms raised to the power "x" a = assimilation coefficient, derived by solution d(l,2) = stream distance from second defined input in subsegment 1 to the downstream delimiting monitoring station d(2,u) = weighted mean stream distance from all points within the subsegment to the next downstream monitoring station.
Thus, the load arriving at the downstream monitoring station is the sum of each defined input in the subsegment assimilated over the distance to the downstream station, plus an undefined area input assimilated over the mean stream distance within the subsegment. A solution can be obtained for each pair of these equations if it is assumed that undefined area inputs are proportional to the drainage area. By iterative successive approximation, estimates for the parameter "a," the assimilation coefficient, are obtained. For every possible pair of subsegments in the segment, one estimate of "a" is derived. Based on a large number of tests on controlled data with known assimilation rates and known variability, the median of this distribution of solutions has been shown to be the best estimator of the average assimilation coefficient for the segment as a whole.
For the upper Sangamon, five subsegments provide ten solutions for "a." The more variable the supporting data, the more variable are the solutions to the simultaneous equations. As an index to this variability and for the same reason that issue is made of the adequacy of the water quality data, the first and third quartile solutions from the distributions are presented in the analysis. The median assimilation rate will normally be a negative value or zero. If the median is positive, a zero value is supplied and used, since positive values imply that the substance is creating more of itself.
With an estimate of the assimilation coefficient for the segment, all defined inputs in each subsegment are assimilated over their known distances and subtracted from the load arriving at the downstream monitoring station. After  all have been subtracted, the residuum represents the assimilated, undefined area contribution. To convert this value to the corresponding input quantity, it is "unassimilated" over the appropriate distance. Finally, relative fractional contributions to all monitoring stations from defined and undefined upstream sources are computed. Table 4 presents the relative contributions of segment sources of nitrate nitrogen at all downstream monitoring stations during 1972-1976, along with the first, second, and third quartile solutions for the assimilation coefficient. The Table is read thus: each entity listed contributes'in the fractional amount indicated to the entity indicated by the nearest line of stars directly above the fractional value. For example, undefined area input /E20/ contributes 100 per cent of the load at E20, 33 per cent of the load arriving at E19, 29 per cent of the load arriving at E08, and 21 per cent of the load arriving at E18. The importance of this analysis is self-evident. Only when one understands where water quality problems originate can one begin to attack them rationally.

RESULTS
A facsimile of the computer output for nitrate nitrogen for the upper Sangamon example is shown in Figure 2. Below the heading are the first, second, and third quartile solutions for assimilation rate from the simultaneous equations. Below that is the table of relative contributions computed using the assimilation coefficient indicated. The next lower table lists the various sources and the actual or computed concentration values for each. All the concentrations listed after the entries enclosed by diagonals (e.g., /E18/) are computed by difference and represent the concentration if the load arriving from that source were averaged over the incremental flow from the subsegment. The values in the input column indicate the load delivered to the stream, and those in the contribution column represent the load arriving at the next downstream station; the difference is due to assimilation. The number of samples and the coefficients of variation of the concentrations are listed in the last two columns for the analyst's information.
The final table on the page summarizes, for each water quality monitoring station, the total fractional contribution at that point from different types of sources within the system being analyzed. Note that for the Sangamon example, the large fraction of nitrate nitrogen derives from undefined area sources. The land in this watershed is largely devoted to row-crop agriculture. Figures 3 through 5 show the computer outputs for analyses of méthylène blue active substances, total dissolved solids, and total phosphorus.

Sensitivity Analysis
In general, variations in the assimilation rate will produce concomitant variations in the relative contributions of defined water quality sources. The magnitude of the resulting change will depend principally on the relative magnitude of this defined water quality source and on its distance to the downstream monitoring station. The longer the distance, the greater the resulting variation; likewise, the larger the relative magnitude of the defined contribution, the greater the resulting variation. Table 5 shows, for the total phosphorus analysis presented in Figure 5, the relative contributions from defined sources under three different assimilation rates: none, the median, and twice the median.

DISCUSSION
There is, we submit, never any substitute for professional judgment. The results of this analysis are only as good as the data and the assumptions on which they are based. Thus, the representativeness of the data is of overwhelming      importance in this system. If the data are highly representative of the actual situation, the water quality manager can, with some confidence, prescribe actions based on them. If they are not, then he cannot. It is for this reason that the number of samples and the variability of those samples are listed on the computer printout. If the number of samples is quite small or the variability is rather large, the manager may decide that he simply needs more adequate data before choosing a course of action. As opposed to most simulation modeling exercises where the very impressiveness of the formatted output can mask the insufficiency of the input data, the system described here provides the user an indication of potentially unreliable results. On occasion the procedure may produce nonsensical results, such as negative values for a calculated undefined area input. For each subsegment the undefined area input is calculated by finding the difference between the assimilated defined inputs and the total load reaching the monitoring station. Anomalies in the concentration data can, at times, cause the resulting values to be negative. Such an occurrence could be due to an abnormally low estimate of mean concentration at the downstream delimiting station, or it could reflect a real subsegment assimilation rate substantially faster (more negative) than the average for the segment as a whole. Here, again, the manager must consider the number of samples and the variability of those samples as he tries to determine the cause.

♦ ♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦MRTER QUflL I TV rrjNTP T Bi IT IDN"~ ♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦< ΙΠ: SECDΝΠ CUT, MERHS, REVISED P R D G P H M
The assimilation rates in our example are calculated on the basis of distance of travel. It may be more appropriate in some cases to use time of travel. This change may be made by substituting a suitable unit for distance in Table 2 and the simultaneous equations. Our efforts have been confined largely to using distance of travel because of the inadequacy of time of travel data.
Many constituents are not adequately modeled by first-order assimilation, such as dissolved oxygen, temperature, and pH, and cannot be analyzed using our procedure at present. Further study and theoretical development are being undertaken to determine whether these constituents can be successfully incorporated.
The Sangamon River example points out a rather general problem: the lack of resolution of nonpoint source contributors to water quality. The present procedure depends on assuming that undefined area inputs are proportional to the size of the contributing area. Obviously, the more diverse the pattern of land use within a segment and between subsegments, the less valid this assumption will be. We have been fortunate to work with areas where land use is relatively uniform, so that the assumption does not lead to great error.
Our example does not identify any defined nonpoint sources. However, this entity can readily be introduced given appropriate data. In Illinois, efforts are underway to estimate the magnitude of various types of nonpoint source contributions [1] and similar efforts may be undertaken elsewhere [2].
In general, the validity of the procedure has been satisfactorily demonstrated on numerous test cases in Illinois [1,3]. Even with the expected amount of variability in the sampling data, reasonably accurate general conclusions have been derived concerning the relative contributions of various water quality sources.
To assist with the computations, FORTRAN software has been developed, a listing of which will be furnished upon written request to the senior author.