James P. LeSage
Department of Economics
University of Toledo
May, 1999
This text provides an introduction to spatial econometrics as well as a set of MATLAB functions that implement a host of spatial econometric estimation methods. The intended audience is faculty and students involved in modeling spatial data sets using spatial econometric methods. The MATLAB functions described in this book have been used in my own research as well as teaching both undergraduate and graduate econometrics courses.
Toolboxes are the name given by the MathWorks to related sets of MATLAB functions aimed at solving a particular class of problems. Toolboxes of functions useful in signal processing, optimization, statistics, finance and a host of other areas are available from the MathWorks as add-ons to the standard MATLAB software distribution. I use the term Econometrics Toolbox to refer to my collection of function libraries described in a manual entitled Applied Econometrics using MATLAB available at http://www.econ.utoledo.edu.
The MATLAB spatial econometrics functions used to apply the spatial econometric models discussed in this text rely on many of the functions in the Econometrics Toolbox. The spatial econometric functions constitute a ``library'' within the broader set of econometric functions. To use the spatial econometrics functions library you need to install the entire set of Econometrics Toolbox functions in MATLAB. The spatial econometrics functions library is part of the Econometrics Toolbox and will be installed and available for use as are the econometrics functions.
Researchers currently using Gauss, RATS, TSP, or SAS for econometric programming might find switching to MATLAB advantageous. MATLAB software has always had excellent numerical algorithms, and has recently been extended to include: sparse matrix algorithms and very good graphical capabilities. MATLAB software is available on a wide variety of computing platforms including mainframe, Intel, Apple, and Linux or Unix workstations. A Student Version of MATLAB is available for less than $100. This version is limited in the size of problems it can solve, but many of the examples in this text rely on a small data sample with 49 observations that can be used with the Student Version of MATLAB.
The collection of around 450 functions and demonstration programs are organized into libraries, with approximately 30 spatial econometrics library functions described in this text. For those interested in other econometric functions or in adding programs to the spatial econometrics library, see the manual for the Econometrics Toolbox. The 350 page manual provides many details regarding programming techniques used to construct the functions and examples of adding new functions to the Econometrics Toolbox. This text does not focus on programming methods. The emphasis here is on applying the existing spatial econometric estimation functions to modeling spatial data sets.
A consistent design was implemented that provides documentation, example programs, and functions to produce printed as well as graphical presentation of estimation results for all of the econometric functions. This was accomplished using the ``structure variables'' introduced in MATLAB Version 5. Information from econometric estimation is encapsulated into a single variable that contains ``fields'' for individual parameters and statistics related to the econometric results. A thoughtful design by the MathWorks allows these structure variables to contain scalar, vector, matrix, string, and even multi-dimensional matrices as fields. This allows the econometric functions to return a single structure that contains all estimation results. These structures can be passed to other functions that can intelligently decipher the information and provide a printed or graphical presentation of the results.
The Econometrics Toolbox along with the spatial econometrics library functions should allow faculty to use MATLAB in undergraduate and graduate level courses with absolutely no programming on the part of students or faculty. In addition to providing a set of spatial econometric estimation routines and documentation, the book has another goal, applied modeling strategies and data analysis. Given the ability to easily implement a host of alternative models and produce estimates rapidly, attention naturally turns to which models and estimates work best to summarize a spatial data sample. Much of the discussion in this text is on these issues.
This text is provided in Adobe PDF and HTML formats for online use. It attempts to draw on the unique aspects of a computer presentation platform. The ability to present program code, data sets and applied examples in an online fashion is a relatively recent phenomenon, so issues of how to best accomplish a useful online presentation are numerous. For the online text the following features were included in the PDF and HTML documents.
All of the examples in the text and the datasets are available offline and on my Web site: http://www.econ.utoledo.edu under the MATLAB gallery icon.
Finally, there are obviously omissions, bugs and perhaps programming errors in the Econometrics Toolbox and the spatial econometrics library functions. This would likely be the case with any such endeavor. I would be grateful if users would notify me when they encounter problems. It would also be helpful if users who produce generally useful functions that extend the toolbox would submit them for inclusion. Much of the econometric code I encounter on the internet is simply too specific to a single research problem to be generally useful in other applications. If econometrics researchers are serious about their newly proposed estimation methods, they should take the time to craft a generally useful MATLAB function that others could use in applied research. Inclusion in the spatial econometrics function library would have the added benefit of introducing new research methods to faculty and their students.
The latest version of the Econometrics Toolbox functions can be found on the Internet at: http://www.econ.utoledo.edu under the MATLAB gallery icon. Instructions for installing these functions are in an Appendix to this text along with a listing of the functions in the library and a brief description of each.
This chapter provides an overview of the nature of spatial econometrics. An applied model-based approach is taken where various spatial econometric methods are introduced in the context of spatial data sets and models based on the data. The remaining chapters of the text are organized along the lines of alternative spatial econometric estimation procedures. Each chapter illustrates applications of a different econometric estimation method and provides references to the literature regarding these methods.
Section 1.1 sets forth the nature of spatial econometrics and discusses differences with traditional econometrics. We will see that spatial econometrics is characterized by: 1) spatial dependence between sample data observations at various points in the Cartesian plane, and 2) spatial heterogeneity that arises from relationships or model parameters that vary with our sample data as we move over the Cartesian plane.
The nature of spatially dependent or spatially correlated data is taken up in Section 1.2 and spatial heterogeneity is discussed in Section 1.3. Section 1.4 takes up the subject of how we formally incorporate the locational information from spatial data in econometric models. In addition to the theoretical discussion of incorporating locational information in econometric models, Section 1.4 provides a preview of alternative spatial econometric estimation methods that will be covered in Chapters 2 through 4.
Finally, Section 1.5 describes software design issues related to a spatial econometric function library based on MATLAB software from the MathWorks Inc. Functions are described throughout the text that implement the spatial econometric estimation methods discussed. These functions provide a consistent user-interface in terms of documentation and related functions that provide printed as well as graphical presentation of the estimation results. Section 1.5 introduces the spatial econometrics function library which is part of a broader collection of econometric estimation functions available in my public domain Econometrics Toolbox.
Applied work in regional science relies heavily on sample data that are collected with reference to locations measured as points in space. The subject of how we incorporate the locational aspect of sample data is deferred until Section 1.4. What distinguishes spatial econometrics from traditional econometrics? Two problems arise when sample data has a locational component: 1) spatial dependence exists between the observations and 2) spatial heterogeneity occurs in the relationships we are modeling.
Traditional econometrics has largely ignored these two issues that violate the Gauss-Markov assumptions used in regression modeling. With regard to spatial dependence between observations, recall that Gauss-Markov assumes the explanatory variables are fixed in repeated sampling. Spatial dependence violates this assumption, a point that will be made clear in the next section. This gives rise to the need for alternative estimation approaches. Similarly, spatial heterogeneity violates the Gauss-Markov assumption that a single linear relationship exists across the sample data observations. If the relationship varies as we move across the spatial data sample, alternative estimation procedures are needed to successfully model this type of variation and draw appropriate inferences.
The subject of this text is alternative estimation approaches that can be used when dealing with spatial data samples. For example, no discussion of issues and models related to spatial data samples occurs in Amemiya (1985), Chow (1983), Dhrymes (1978), Fomby et al. (1984), Green (1997), Intrilligator (1978), Kelijian and Oates (1989), Kmenta (1986), Maddala (1977), Pindyck and Rubinfeld (1981), Schmidt (1976), and Vinod and Ullah (1981).
Anselin (1988) provides a complete treatment of many facets of spatial econometrics which this text draws upon. In addition to introducing ideas set forth in Anselin (1988), this presentation includes some more recent approaches based on Bayesian methods applied to spatial econometric models. In terms of focus, the materials presented here are more applied than Anselin (1988), providing program functions and illustrations of hands-on approaches to implementing the estimation methods described. Another departure from Anselin (1988) is in the use of sparse matrix algorithms available in the MATLAB software to implement spatial econometric estimation procedures. These implementation details represent previously unpublished material that describes a set of (freely available) programs for solving large-scale spatial econometric problems involving thousands of observations in a few minutes on a modest desktop computer. Students as well as researchers can use these programs without any programming to implement some of the latest estimation procedures on large-scale spatial data sets. A commerically available program called SpaceStat is available from Anselin that implements the maximum likelihood estimation methods by relying on Gauss software. Of course another distinction of the presentation here is the interactive aspect of a Web-based format that allows a hands-on approach that provides links to code, sample data and examples.
Spatial dependence in a collection of sample data observations refers to the
fact that one observation associated with a location which we might label i
depends on other observations at locations
.
Formally, we
might state:
A second and perhaps more important reason we would expect spatial dependence is that the spatial dimension of socio-demographic, economic or regional activity may truly be an important aspect of a modeling problem. Regional science is based on the premise that location and distance are important forces at work in human geography and market activity. All of these notions have been formalized in regional science theory that relies on notions of spatial interaction and diffusion effects, hierarchies of place and spatial spillovers.
As a concrete example of this type of spatial dependence, we use a spatial data set on annual county-level counts of Gypsy moths established by the Michigan Department of Natural Resources (DNR) for the 68 counties in lower Michigan.
The North American gypsy moth infestation in the United States provides a classic example of a natural phenomena that is spatial in character. During 1981, the moths ate through 12 million acres of forest in 17 Northeastern states and Washington, DC. More recently, the moths have been spreading into the northern and eastern Midwest and to the Pacific Northwest. For example, in 1992 the Michigan Department of Agriculture estimated that more than 700,000 acres of forest land had experienced at least a 50% defoliation rate.
Figure 1.1 shows a map of the moth counts for 1985
in lower Michigan. We see the highest
level of moth counts near Midland county Michigan in the center. As
we move outward from the center, lower levels of moth counts occur
taking the form of concentric rings. A set of k data points
taken from the same ring would exhibit a high
correlation with each other. In terms of (1.1), yi and
yj where both observations i and j come from the same ring
should be highly correlated. The correlation of k1 points taken
from one ring and k2 points from a neighboring ring should also
exhibit a high correlation, but not as high as points sampled from the
same ring. As we examine the correlation between points taken from
more distant rings, we would expect the correlation to diminish.
Over time the Gypsy moths spread to neighboring areas. They cannot fly, so the diffusion should be relatively slow. Figure 1.2 shows a similarly constructed contour map of moth counts for the next year, 1986. We see some evidence of diffusion to neighboring areas between 1985 and 1986. The circular pattern of higher levels in the center and lower levels radiating out from the center is still quite evident.
How does this situation differ from the traditional view of the
process at work to generate economic data samples? The Gauss-Markov
view of a regression data sample is that the generating process takes
the form of (1.2), where y represent a vector of n
observations, X denotes an nxk matrix of explanatory variables,
is a vector of k parameters and
is a vector of
n stochastic disturbance terms.
The generating process is such that the X matrix and true parameters
are fixed while repeated disturbance vectors
work to generate the samples y that we observe. Given that the
matrix X and parameters
are fixed, the distribution of
sample y vectors will have the same variance-covariance structure as
.
Additional assumptions regarding the nature of the
variance-covariance structure of
were invoked by
Gauss-Markov to ensure that the distribution of individual
observations in y exhibit a constant variance as we move across
observations, and zero covariance between the observations.
It should be clear that observations from our sample of moth level counts do not obey this structure. As illustrated in Figures 1.1 and 1.2, observations from counties in concentric rings are highly correlated, with a decay of correlation as we move to observations from more distant rings.
Spatial dependence arising from underlying regional interactions in regional science data samples suggests the need to quantify and model the nature of the unspecified functional spatial dependence function f(), set forth in (1.1). Before turning attention to this task, the next section discusses the other underlying condition leading to a need for spatial econometrics -- spatial heterogeneity.
The term spatial heterogeneity refers to variation in relationships over space. In the most general case we might expect a different relationship to hold for every point in space. Formally, we write a linear relationship depicting this as:
Where i indexes observations collected at
points in
space, Xi represents a (1 x k) vector of explanatory variables
with an associated set of parameters
,
yi is the
dependent variable at observation (or location) i and
denotes a stochastic disturbance in the linear
relationship.
A slightly more complicated way of expressing this notion is to allow the function f() from (1.1) to vary with the observation index i, that is:
One can also view the specification task as one of placing restrictions on the nature of variation in the relationship over space. For example, suppose we classified our spatial observations into urban and rural regions. We could then restrict our analysis to two relationships, one homogeneous across all urban observational units and another for the rural units. This raises a number of questions: 1) are two relations consistent with the data, or is there evidence to suggest more than two? 2) is there a trade-off between efficiency in the estimates and the number of restrictions we use? 3) are the estimates biased if the restrictions are inconsistent with the sample data information? and other issues we will explore.
One of the compelling motivations for the use of Bayesian methods in spatial econometrics is their ability to impose restrictions that are stochastic rather than exact in nature. Bayesian methods allow us to impose restrictions with varying amounts of prior uncertainty. In the limit, as we impose a restriction with a great deal of certainty, the restriction becomes exact. Carrying out our econometric analysis with varying amounts of prior uncertainty regarding a restriction allows us to provide a continuous mapping of the restriction's impact on the estimation outcomes.
As a concrete illustration of spatial heterogeneity, we use a sample of 35,000 homes that sold within the last 5 years in Lucas county, Ohio. The selling prices were sorted from low to high and three samples of 5,000 homes were constructed. The 5,000 homes with the lowest selling prices were used to represent a sample of low-price homes. The 5,000 homes with selling prices that ranked from 15,001 to 20,000 in the sorted list were used to construct a sample of medium-price homes and the 5,000 highest selling prices from 30,0001 to 35,000 served as the basis for a high-price sample. It should be noted that the sample consisted of 35,702 homes, but the highest 702 selling prices were omitted from this exercise as they represent very high prices that are atypical.
Using the latitude-longitude coordinates, the distance from the central business district (CBD) in the city of Toledo, which is at the center of Lucas county was calculated. The three samples of 5,000 low, medium and high priced homes were used to estimate three empirical distributions that are graphed with respect to distance from the CBD in Figure 1.3.
We see three distinct distributions, with low-priced homes nearest to the CBD and high priced homes farthest away from the CBD. This suggests different relationships may be at work to describe home prices in different locations. Of course this is not surprising, numerous regional science theories exist to explain land usage patterns as a function of distance from the CBD. Nonetheless, these three distinct distributions provide a contrast to the Gauss-Markov assumption that the distribution of sample data exhibits a constant mean and variance as we move across the observations.
Another illustration of spatial heterogeneity is provided by three distributions for total square feet of living area of low, medium and high priced homes shown in Figure 1.4. Here we see only two distinct distributions, suggesting a pattern where the highest priced homes are the largest, but low and medium priced homes have roughly similar distributions with regard to living space.
It may be the case that important explanatory variables in the house value relationship change as we move over space. Living space may be unimportant in distinguishing between low and medium priced homes, but significant for higher priced homes. Distance from the CBD on the other hand appears to work well in distinguishing all three categories of house values.
A first task we must undertake before we can ask questions about spatial dependence and heterogeneity is quantification of the locational aspects of our sample data. Given that we can always map a set of spatial data observations, we have two sources of information on which we can draw.
The location in Cartesian space represented by latitude and longitude is one source of information. This information also allows us to calculate distances from any point in space, or the distance of observations located at distinct points in space to observations at other locations. Spatial dependence should conform to the fundamental theorem of regional science i.e., that distance matters. Observations that are near each other should reflect a greater degree of spatial dependence than those more distant from each other. In other words, the strength of spatial dependence between observations should decline with the distance between observations.
The second source of locational information is contiguity, reflecting the relative position in space of one regional unit of observation to other such units. Measures of contiguity rely on knowledge of the size and shape of the observational units depicted on a map. From this, we can determine which units are neighbors (have borders that touch) or represent observational units in reasonable proximity to each other. Regarding spatial dependence, neighboring units should exhibit a higher degree of spatial dependence than units located far apart.
I note in passing that these two types of information are not necessarily different. Given the latitude-longitude coordinates of an observation, we could construct a contiguity structure by defining a ``neighboring observation'' as one that lies within a certain distance. Consider also that given the centroid coordinates of a set of observations associated with contiguous map regions, we can calculate distances between the regions (or observations).
We will illustrate how both types of locational information can be used in spatial econometric modeling. We first take up the issue of quantifying spatial contiguity, which is used in the models presented in Chapter 2.
Figure 1.2 shows a hypothetical example of five regions as they would appear on a map. We wish to construct a 5 by 5 binary matrix W containing 25 elements taking values of 0 or 1 that captures the notion of ``connectiveness'' between the five entities depicted in the map configuration. We record in each row of the matrix W a set of contiguity relations associated with one of the five regions. For example the matrix element in row 1, column 2 would record the presence (represented by a 1) or absence (denoted by 0) of a contiguity relationship between regions 1 and 2. As another example, the row 3, column 4 element would reflect the presence or absence of contiguity between regions 3 and 4. Of course, a matrix constructed in such fashion must be symmetric -- if regions 3 and 4 are contiguous, so are regions 4 and 3.
It turns out there are many ways to accomplish our task. Below, we enumerate some of the alternative ways we might define a binary matrix W that represent alternative definitions of the ``contiguity'' relationships between the five entities in Figure 1.5. For the enumeration below, start with a matrix filled with zeros, then consider the following alternative ways to define the presence of a contiguity relationship.
Believe it or not, there are even more ways one could proceed. For a good discussion of these issues, see Appendix 1 of Kelejian and Robinson (1995). Note also that the double linear and double rook definitions are sometimes referred to as ``second order'' contiguity, whereas the other definitions are termed ``first order''. More elaborate definitions sometimes rely on the distance of shared borders. This might impact whether we considered regions (4) and (5) in Figure 1.5 as contiguous or not. They have a common border, but it is very short. Note that in the case of a vertex, the rook definition rules out a contiguity relation, whereas the bishop and queen definitions would record a relationship.
The guiding principle in selecting a definition should be the nature of the problem being modeled, together with any additional non-sample information that is available. For example, suppose that a major highway connecting regions (2) and (3) existed and we knew that region (2) was a ``bedroom community'' for persons who work in region (3). Given this non-sample information, we would not want to rely on the rook definition that would rule out a contiguity relationship, as there is quite reasonably a large amount of spatial interaction between these two regions.
We will use the rook definition to define a first-order contiguity matrix for the five regions in Figure 1.5 as a concrete illustration. This is a definition that is often used in applied work. Perhaps the motivation for this is that we simply need to locate all regions on the map that have common borders with some positive length.
The matrix W reflecting first-order rook's contiguity relations for the five regions in Figure 1.5 is:
Note that the matrix W is symmetric as indicated above, and by convention the matrix always has zeros on the main diagonal. A transformation often used in applied work is to convert the matrix W to have row-sums of unity. This is referred to as a ``standardized first-order'' contiguity matrix, which we denote as C:
The motivation for the standardization can be seen by considering what happens
if we use matrix multiplication of C and a vector of observations on some
variable associated with the five regions which we label y. This matrix
product
represents a new variable equal to the mean of
observations from contiguous regions:
This is one way of quantifying the notion that
,
expressed in (1.1). Consider now a linear relationship that
uses the variable
we constructed in (1.7) as
an explanatory variable in a linear regression relationship
to explain variation in y across the spatial sample of
observations.
One point to note is that traditional explanatory variables
of the type encountered in regression
can be added to the model in (1.8). We can represent
these with the traditional matrix notation:
,
allowing
us to modify (1.8) to take the form shown in
(1.9).
As an illustration, consider the following example which is intended to serve as
a preview of material covered in the next two chapters. We provide a set of
regression estimates based on maximum likelihood procedures for a spatial
data set consisting of 49 neighborhoods in Columbus, Ohio set forth in Anselin
(1988). The data set consists of observations on three variables: neighborhood crime incidents, household income, and house values for all
49 neighborhoods. The model uses the income and house
values to explain variation in neighborhood crime incidents. That is, y=
neighborhood crime, X=(a constant, household income, house
values). The estimates are shown below, printed in the usual regression format
with associated statistics for precision of the estimates, fit of the model and
an estimate of the disturbance variance,
.
Spatial autoregressive Model Estimates Dependent Variable = Crime R-squared = 0.6518 Rbar-squared = 0.6366 sigma^2 = 95.5032 log-likelihood = -165.41269 Nobs, Nvars = 49, 3 *************************************************************** Variable Coefficient t-statistic t-probability constant 45.056251 6.231261 0.000000 income -1.030641 -3.373768 0.001534 house value -0.265970 -3.004945 0.004331 rho 0.431381 3.625340 0.000732
For this example, we can calculate the proportion of total variation explained
by spatial dependence with a comparison of
the fit measured by
from this
model to the fit of a least-squares model that excludes the spatial dependence
variable C y. The least-squares regression for comparison is shown below:
Ordinary Least-squares Estimates Dependent Variable = Crime R-squared = 0.5521 Rbar-squared = 0.5327 sigma^2 = 130.8386 Durbin-Watson = 1.1934 Nobs, Nvars = 49, 3 *************************************************************** Variable Coefficient t-statistic t-probability constant 68.609759 14.484270 0.000000 income -1.596072 -4.776038 0.000019 house value -0.274079 -2.655006 0.010858
We see that around 10 percent of the variation in the crime incidents is
explained by spatial dependence, because the
is roughly 0.63 in the
model that takes spatial dependence into account and 0.53 in the least-squares
model that ignores this aspect of the spatial data sample. Note also that the
t-statistic on the parameter for the spatial dependence variable Cy is 3.62,
indicating that this explanatory variable has a coefficient estimate that is
significantly different from zero. In addition, the coefficient
on income falls in absolute value when we include the spatial
lagged variable Cy in the model.
We will pursue more examples
in Chapters 2 and 3, with this example
provided as a concrete demonstration of some of the ideas
we have discussed.
Associating location in space with observations is essential to modeling relationships that exhibit spatial heterogeneity. Recall this means there is variation in the relationship being modeled over space. We illustrate two approaches to using location that allow locally linear regressions to be fit over sub-regions of space. These form the basis for models we will discuss in Chapter 4.
Casetti (1972, 1992) introduced our first approach that involves a method he
labels ``spatial expansion''. The model is shown in (1.10), where y
denotes an nx1 dependent variable vector associated with spatial observations
and X is an nxnk matrix consisting of terms xi representing kx1
explanatory variable vectors, as shown in (1.11). The locational information
is recorded in the matrix Z which has elements
,
that represent latitude and longitude coordinates of each
observation as shown in (1.11).
The model posits that the parameters vary as a function of the latitude and
longitude coordinates. The only parameters that need be estimated are the
parameters in
that we denote,
.
These
represent a set of 2k parameters. Recall our discussion about spatial
heterogeneity and the need to utilize a parsimonious specification for variation
over space. This represents one approach to this type of specification.
We note that the parameter vector
in (1.10)
represents an nkx1 matrix in this model that contains parameter estimates
for all k explanatory variables at every observation. The parameter
vector
contains the 2k parameters to be estimated.
Where:
This model can be estimated using least-squares to produce estimates of the 2k
parameters
.
Given these estimates, the remaining
estimates for individual points in space can be derived using the second
equation in (1.10). This process is referred to as the ``expansion
process''. To see this, substitute the second equation in (1.10) into the
first, producing:
Here it is clear that X, Z and J represent available information or data
observations and only
represents parameters in the model that need be
estimated.
The model would capture spatial heterogeneity by allowing variation in the underlying relationship such that clusters of nearby or neighboring observations measured by latitude-longitude coordinates take on similar parameter values. As the location varies, the regression relationship changes to accommodate a locally linear fit through clusters of observations in close proximity to one another.
Another approach to modeling variation over space is based on locally weighted
regressions to produce estimates for every point in space by using a sub-sample
of data information from nearby observations. McMillen (1996) and Brundson,
Fotheringham and Charlton (1996) introduce this type of approach. It has been
labeled ``geographically weighted regression'' (GWR) by Brundson, Fotheringham
and Charlton (1996). Let y denote an nx1 vector of dependent variable
observations collected at n points in space, X an nxk matrix of
explanatory variables, and
an nx1 vector of normally
distributed, constant variance disturbances. Letting Wi represent an
nxn diagonal matrix containing distance-based weights for observation i
that reflects the distance between observation i and all other observations,
we can write the GWR model as:
The subscript i on
indicates that this kx1 parameter
vector is associated with observation i. The GWR model produces n
such vectors of parameter estimates, one for each observation. These
estimates are produced using:
One confusing aspect of this notation is that Wi y denotes an n-vector of
distance-weighted observations used to produce estimates for observation i.
The notation is confusing because we usually use subscripts to index scalar
magnitudes representing individual elements of a vector. Note also, that Wi
X represents a distance-weighted data matrix, not a single observation and
represents a n-vector. The precise nature of the
distance weighting is taken up in Chapter 4.
It may have occurred to the reader that a homogeneous model fit to a spatial data sample that exhibits heterogeneity will produce residuals that exhibit spatial dependence. The residuals or errors obtained from the homogeneous model should reflect unexplained variation attributable to heterogeneity in the underlying relationship over space. Spatial clustering of the residuals would occur with positive and negative residuals appearing in distinct regions and patterns on the map. This of course was our motivation and illustration of spatial dependence as illustrated in Figure 1.2. You might infer correctly that spatial heterogeneity and dependence are often related in the context of modeling. An inappropriate model that fails to capture spatial heterogeneity will result in residuals that exhibit spatial dependence. This is another topic we discuss in the following chapters of this text.
A fundamental concept that relates to spatial contiguity is the notion of a spatial lag operator. Spatial lags are analogous to the backshift operator B from time series analysis. This operator shifts observations back in time, where B yt = yt-1, defines a first-order lag and Bp yt = yt-p represents a pth order lag. In contrast to the time domain, spatial lag operators imply a shift over space but are restricted by some complications that arise when one tries to make analogies between the time and space domains.
Cressie (1991) points out that in the restrictive context of regular lattices or grids the spatial lag concept implies observations that are one or more distance units away from a given location, where distance units can be measured in two or four directions. In applied situations where observations are unlikely to represent a regular lattice or grid because they tend to be irregularly shaped map regions, the concept of a spatial lag relates to the set of neighbors associated with a particular location. The spatial lag operator works in this context to produce a weighted average of the neighboring observations.
In Section 1.4.1 we saw that the concept of ``neighbors'' in spatial analysis is not unambiguous, it depends on the definition used. By analogy to time series analysis it seems reasonable to simply raise our first-order binary contiguity matrix W containing 0 and 1 values to a power, say p to create a spatial lag. However, Blommestein (1985) points out that doing this produces circular or redundant routes, where he draws an analogy between binary contiguity and the graph theory notion of an adjacency matrix. If we use spatial lag matrices produced in this way in maximum likelihood estimation methods, spurious results can arise because of the circular or redundant routes created by this simplistic approach. Anselin and Smirnov (1994) provide details on many of the issues involved here.
For our purposes, we simply want to point out that an appropriate approach to creating spatial lags requires that the redundancies be eliminated from spatial weight matrices representing higher-order contiguity relationships. The spatial econometrics library contains a function to properly construct spatial lags of any order and the function deals with eliminating redundancies.
We provide a brief illustration of how spatial lags introduce information regarding ``neighbors to neighbors'' into our analysis. These spatial lags will be used in Chapter 3 when we discuss spatial autoregressive models.
To illustrate these ideas, we use a first-order contiguity matrix for a small data sample containing 49 neighborhoods in Columbus, Ohio taken from Anselin (1988). This contiguity matrix is typical of those encountered in applied practice as it relates irregularly shaped regions representing each neighborhood. Figure 1.6 shows the pattern of 0 and 1 values in a 49 by 49 grid. Recall that a non-zero entry in row i, column j denotes that neighborhoods i and j have borders that touch which we refer to as ``neighbors''. Of the 2401 possible elements in the 49 by 49 matrix, there are only 232 are non-zero elements designated on the axis in the figure by `nz = 232'. These non-zero entries reflect the contiguity relations between the neighborhoods. The first-order contiguity matrix is symmetric which can be seen in the figure. This reflects the fact that if neighborhood i borders j, then j must also border i.
Figure 1.7 shows the original first-order contiguity matrix along with a second-order spatially lagged matrix, whose non-zero elements are represented by a `+' symbol in the figure. This graphical depiction of a spatial lag demonstrates that the spatial lag concept works to produce a contiguity or connectiveness structure that represents ``neighbors of neighbors''.
How might the notion of a spatial lag be useful in spatial econometric modeling? We might encounter a process where spatial diffusion effects are operating through time. Over time the initial impacts on neighbors work to influence more and more regions. The spreading impact might reasonably be considered to flow outward from neighbor to neighbor, and the spatial lag concept would capture this idea.
As an illustration of the redundancies produced by simply raising a first-order contiguity matrix to a higher power, Figure 1.8 shows a second-order spatial lag matrix created by simply powering the first-order matrix. The non-zero elements in this inappropriately generated spatial lag matrix are represented by `+' symbols with the original first-order non-zero elements denoted by `o' symbols. We see that this second order spatial lag matrix contains 689 non-zero elements in contrast to only 410 for the correctly generated second order spatial lag matrix that eliminates the redundancies.
We will have occasion to use spatial lags in our examination of spatial autoregressive models in Chapters 3, 4 and 5. The MATLAB function from the spatial econometrics library as well as other functions for working with spatial contiguity matrices will be presented along with examples of their use in spatial econometric modeling.
As indicated in the preface, all of the spatial econometric methods discussed in this text have been implemented using the MATLAB software from MathWorks Inc. Toolboxes are the name given by the MathWorks to related sets of MATLAB functions aimed at solving a particular class of problems. Toolboxes of functions useful in signal processing, optimization, statistics, finance and a host of other areas are available from the MathWorks as add-ons to the standard MATLAB distribution. We will reserve the term Econometrics Toolbox to refer to my larger collection of econometric functions available in the public domain at www.econ.utoledo.edu. The spatial econometrics library represents a smaller part of this larger collection of software functions for econometric analysis. I have used the term library to denote subsets of functions aimed at various categories of estimation methods. The Econometrics Toolbox contains libraries for econometric regression analysis, time-series and vector autoregressive modeling, optimization functions to solve general maximum likelihood estimation problems, Bayesian Gibbs sampling diagnostics, error correction testing and estimation methods, simultaneous equation models and a collection of utility functions that I designate as the utility function library. Taken together, these constitute the Econometrics Toolbox that is described in a 350 page manual available at the Web site listed above.
The spatial econometrics library functions rely on some of the utility functions and are implemented using a general design that provides a common user-interface for the entire toolbox of econometric estimation functions. In Chapter 2 we will use MATLAB functions to carry out spatial econometric estimation methods. Here, we discuss the general design that is used to implement all of the spatial econometric estimation functions. Having some feel for the way in which these functions work and communicate with other functions in the Econometric Toolbox should allow you to more effectively use these functions to solve spatial econometric estimation problems.
The entire Econometrics Toolbox has been included in the internet-based materials provided here, as well as an online HTML interface to examine the functions available along with their documentation. All functions have accompanying demonstration files that illustrate the typical use of the functions with sample data. These demonstration files can be viewed using the online HTML interface. We have also provided demonstration files for all of the estimation functions in the spatial econometrics library that can be viewed online along with their documentation. Examples are provided in this text and the program files along with the datasets that have been included in the Web-based module.
In designing a spatial econometric library of functions, we need to think about organizing our functions to present a consistent user-interface that packages all of our MATLAB functions in a unified way. The advent of `structures' in MATLAB version 5 allows us to create a host of alternative spatial econometric functions that all return `results structures'.
A structure in MATLAB allows the programmer to create a variable containing what MATLAB calls `fields' that can be accessed by referencing the structure name plus a period and the field name. For example, suppose we have a MATLAB function to perform ordinary least-squares estimation named ols that returns a structure. The user can call the function with input arguments (a dependent variable vector y and explanatory variables matrix x) and provide a variable name for the structure that the ols function will return using:
result = ols(y,x);
The structure variable `result' returned by our ols function might have
fields named `rsqr', `tstat', `beta', etc. These fields might contain the
R-squared statistic, t-statistics and the
least-squares estimates
.
One virtue of using the structure to
return regression results is that the user can access individual fields of
interest as follows:
bhat = result.beta; disp(`The R-squared is:'); result.rsqr disp(`The 2nd t-statistic is:'); result.tstat(2,1)
There is nothing sacred about the name `result' used for the returned structure in the above example, we could have used:
bill_clinton = ols(y,x); result2 = ols(y,x); restricted = ols(y,x); unrestricted = ols(y,x);
That is, the name of the structure to which the ols function returns its information is assigned by the user when calling the function.
To examine the nature of the structure in the variable `result', we can simply type the structure name without a semi-colon and MATLAB will present information about the structure variable as follows:
result =
meth: 'ols'
y: [100x1 double]
nobs: 100.00
nvar: 3.00
beta: [ 3x1 double]
yhat: [100x1 double]
resid: [100x1 double]
sige: 1.01
tstat: [ 3x1 double]
rsqr: 0.74
rbar: 0.73
dw: 1.89
Each field of the structure is indicated, and for scalar components the value of the field is displayed. In the example above, `nobs', `nvar', `sige', `rsqr', `rbar', and `dw' are scalar fields, so their values are displayed. Matrix or vector fields are not displayed, but the size and type of the matrix or vector field is indicated. Scalar string arguments are displayed as illustrated by the `meth' field which contains the string `ols' indicating the regression method that was used to produce the structure. The contents of vector or matrix strings would not be displayed, just their size and type. Matrix and vector fields of the structure can be displayed or accessed using the MATLAB conventions of typing the matrix or vector name without a semi-colon. For example,
result.resid result.y
would display the residual vector and the dependent variable vector y in the MATLAB command window.
Another virtue of using `structures' to return results from our regression functions is that we can pass these structures to another related function that would print or plot the regression results. These related functions can query the structure they receive and intelligently decipher the `meth' field to determine what type of regression results are being printed or plotted. For example, we could have a function prt that prints regression results and another plt that plots actual versus fitted and/or residuals. Both these functions take a regression structure as input arguments. Example 1.1 provides a concrete illustration of these ideas.
% ----- Example 1.1 Demonstrate regression using the ols() function load y.data; load x.data; result = ols(y,x); prt(result); plt(result);
The example assumes the existence of functions ols, prt, plt and data matrices y,x in files `y.data' and `x.data'. Given these, we carry out a regression, print results and plot the actual versus predicted as well as residuals with the MATLAB code shown in example 1.1. We will discuss the prt and plt functions in Section 1.5.2.
Now to put these ideas into practice, consider implementing an ols function. The function code would be stored in a file `ols.m' whose first line is:
function results=ols(y,x)
The keyword `function' instructs MATLAB that the code in the file `ols.m' represents a callable MATLAB function.
The help portion of the MATLAB `ols' function is presented below and follows immediately after the first line as shown. All lines containing the MATLAB comment symbol `%' will be displayed in the MATLAB command window when the user types `help ols'.
function results=ols(y,x) % PURPOSE: least-squares regression %--------------------------------------------------- % USAGE: results = ols(y,x) % where: y = dependent variable vector (nobs x 1) % x = independent variables matrix (nobs x nvar) %--------------------------------------------------- % RETURNS: a structure % results.meth = 'ols' % results.beta = bhat % results.tstat = t-stats % results.yhat = yhat % results.resid = residuals % results.sige = e'*e/(n-k) % results.rsqr = rsquared % results.rbar = rbar-squared % results.dw = Durbin-Watson Statistic % results.nobs = nobs % results.nvar = nvars % results.y = y data vector % -------------------------------------------------- % SEE ALSO: prt(results), plt(results) %---------------------------------------------------
All functions in the spatial econometrics library present a unified documentation format for the MATLAB `help' command by adhering to the convention of sections entitled, `PURPOSE', `USAGE', `RETURNS', `SEE ALSO', and perhaps a `REFERENCES' section, delineated by dashed lines.
The `USAGE' section describes how the function is used, with each input argument enumerated along with any default values. A `RETURNS' section portrays the structure that is returned by the function and each of its fields. To keep the help information uncluttered, we assume some knowledge on the part of the user. For example, we assume the user realizes that the `.residuals' field would be an (nobs x 1) vector and the `.beta' field would consist of an (nvar x 1) vector.
The `SEE ALSO' section points the user to related routines that may be useful. In the case of our ols function, the user might what to rely on the printing or plotting routines prt and plt, so these are indicated. The `REFERENCES' section would be used to provide a literature reference (for the case of our more exotic spatial estimation procedures) where the user could read about the details of the estimation methodology.
As an illustration of the consistency in documentation, consider the function sar that provides estimates for the spatial autoregressive model that we presented in Section 1.4.1. The documentation for this function is shown below:
PURPOSE: computes spatial autoregressive model estimates
y = p*W*y + X*b + e, using sparse matrix algorithms
---------------------------------------------------
USAGE: results = sar(y,x,W,rmin,rmax,convg,maxit)
where: y = dependent variable vector
x = explanatory variables matrix
W = standardized contiguity matrix
rmin = (optional) minimum value of rho to use in search
rmax = (optional) maximum value of rho to use in search
convg = (optional) convergence criterion (default = 1e-8)
maxit = (optional) maximum # of iterations (default = 500)
---------------------------------------------------
RETURNS: a structure
results.meth = 'sar'
results.beta = bhat
results.rho = rho
results.tstat = asymp t-stat (last entry is rho)
results.yhat = yhat
results.resid = residuals
results.sige = sige = (y-p*W*y-x*b)'*(y-p*W*y-x*b)/n
results.rsqr = rsquared
results.rbar = rbar-squared
results.lik = -log likelihood
results.nobs = # of observations
results.nvar = # of explanatory variables in x
results.y = y data vector
results.iter = # of iterations taken
results.romax = 1/max eigenvalue of W (or rmax if input)
results.romin = 1/min eigenvalue of W (or rmin if input)
--------------------------------------------------
SEE ALSO: prt(results), sac, sem, far
---------------------------------------------------
REFERENCES: Anselin (1988), pages 180-182.
---------------------------------------------------
The actual execution code to produce least-squares or spatial autoregressive parameter estimates would follow the documentation in the file discussed above. We do not discuss programming of the spatial econometric functions in the text, but you can of course examine all of the functions to see how they work. The manual for the Econometrics Toolbox provides a great deal of discussion of programming in MATLAB and examples of how to add new functions to the toolbox or change existing functions in the toolbox.
To illustrate the use of the `results' structure returned by our ols function, consider the associated function plt_reg which plots actual versus predicted values along with the residuals. The results structure contains everything needed by the plt_reg function to carry out its task. Earlier, we referred to functions plt and prt rather than plt_reg, but prt and plt are ``wrapper'' functions that call the functions prt_reg and plt_reg where the real work of printing and plotting regression results is carried out. The motivation for taking this approach is that separate smaller functions can be devised to print and plot results from all of the spatial econometric procedures, facilitating development. The wrapper functions eliminate the need for the user to learn the names of different printing and plotting functions associated with each group of spatial econometric procedures -- all results structures can be printed and plotted by simply invoking the prt and plt functions.
Documentation for the plt function which plots results from all spatial econometrics functions as well as the Econometrics Toolbox is shown below. This function is a wrapper function that calls an appropriate plotting function, plt_spat based on the econometric method identified in the results structure `meth' field argument.
PURPOSE: Plots results structures returned by most functions
by calling the appropriate plotting function
---------------------------------------------------
USAGE: plt(results,vnames)
Where: results = a structure returned by an econometric function
vnames = an optional vector of variable names
e.g. vnames = vnames = strvcat('y','const','x1','x2');
--------------------------------------------------
NOTES: this is simply a wrapper function that calls another function
--------------------------------------------------
RETURNS: nothing, just plots the results
--------------------------------------------------
SEE ALSO: prt()
---------------------------------------------------
A decision was made not to place the `pause' command in the plt function, but rather let the user place this statement in the calling program or function. An implication of this is that the user controls viewing regression plots in `for loops' or in the case of multiple invocations of the plt function. For example, only the second `plot' will be shown in the following code.
result1 = sar(y,x1,W); plt(result1); result2 = sar(y,x2,W); plt(result2);
If the user wishes to see the plots associated with the first spatial autoregression, the code would need to be modified as follows:
result1 = sar(y,x1,W); plt(result1); pause; result2 = sar(y,x2,W); plt(result2);
The `pause' statement would force a plot of the results from the first spatial autoregression and wait for the user to strike any key before proceeding with the second regression and accompanying plot of these results.
A more detailed example of using the results structure is the prt function which produces printed output from all of the functions in the spatial econometrics library. The printout of estimation results is similar to that provided by most statistical packages.
The prt function allows the user an option of providing a vector of fixed width variable name strings that will be used when printing the regression coefficients. These can be created using the MATLAB strvcat function that produces a vertical concatenated list of strings with fixed width equal to the longest string in the list. We can also print results to an indicated file rather than the MATLAB command window. Three alternative invocations of the prt function illustrating these options for usage are shown below:
vnames = strvcat('crime','const','income','house value');
res = sar(y,x,W);
prt(res); % print with generic variable names
prt(res,vnames); % print with user-supplied variable names
fid = fopen('sar.out','wr'); % open a file for printing
prt(res,vnames,fid); % print results to file `sar.out'
The first use of prt produces a printout of results to the MATLAB command window that uses `generic' variable names:
Spatial autoregressive Model Estimates R-squared = 0.6518 Rbar-squared = 0.6366 sigma^2 = 95.5033 Nobs, Nvars = 49, 3 log-likelihood = -165.41269 # of iterations = 17 min and max rho = -1.5362, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability variable 1 45.056480 6.231281 0.000000 variable 2 -1.030647 -3.373784 0.001513 variable 3 -0.265970 -3.004944 0.004290 rho 0.431377 3.625292 0.000720
The second use of prt uses the user-supplied variable names. The MATLAB function strvcat carries out a vertical concatenation of strings and pads the shorter strings in the `vnames' vector to have a fixed width based on the longer strings. A fixed width string containing the variable names is required by the prt function. Note that we could have used:
vnames = ['crime ',
'const ',
'income ',
'house value'];
but, this takes up more space and is slightly less convenient as we have to provide the padding of strings ourselves. Using the `vnames' input in the prt function would result in the following printed to the MATLAB command window.
Spatial autoregressive Model Estimates Dependent Variable = crime R-squared = 0.6518 Rbar-squared = 0.6366 sigma^2 = 95.5033 Nobs, Nvars = 49, 3 log-likelihood = -165.41269 # of iterations = 12 min and max rho = -1.5362, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability const 45.056481 6.231281 0.000000 income -1.030647 -3.373784 0.001513 house value -0.265970 -3.004944 0.004290 rho 0.431377 3.625292 0.000720
The third case specifies an output file opened with the command:
fid = fopen('sar.out','wr');
The file `sar.out' would contain output identical to that from the second use of prt. It is the user's responsibility to close the file that was opened using the MATLAB command:
fclose(fid);
In the following chapters that present various spatial estimation methods we will provide the documentation but not the details concerning implementation of the estimation procedures in MATLAB. A function has been devised and incorporated in the spatial econometrics library for each of the estimation procedures that we discuss and illustrate. These functions carry out estimation and provide printed as well as graphical presentation of the results using the design framework that was set forth in this section.
This chapter introduced two main features of spatial data sets, spatial dependence and spatial heterogeneity. Spatial dependence refers to the fact that sample data observations exhibit correlation with reference to points or location in space. We often observe spatial clustering of sample data observations with respect to map regions. An intuitive motivation for this type of result is the existence of spatial hierarchical relationships, spatial spillovers and other types of spatial interactivity studied in regional science.
Spatial heterogeneity refers to the fact that underlying relationships we wish to study may vary systematically over space. This creates problems for regression and other econometric methods that do not accommodate spatial variation in the relationships being modeled. A host of methods have been developed in spatial econometrics that allow the estimated relationship to vary systematically over space.
A large part of the chapter was devoted to introducing how locational information regarding sample data observations is formally incorporated in spatial econometric models. After introducing the concept of a spatial contiguity matrix, we provided a preview of the spatial autoregressive model that relies on the contiguity concept. Chapters 2 and 3 cover this spatial econometric method in detail.
In addition to spatial contiguity, other spatial econometric methods rely on the latitude-longitude information available for spatial data samples to allow variation in the relationship being studied over space. Two approaches to this were introduced, the spatial expansion model and geographically weighted regression, which are the subject of Chapter 4.
Finally, a software design for implementing the spatial econometric estimation methods discussed in this text was set forth. Our estimation methods will be implemented using MATLAB software from the MathWorks Inc. A design based on MATLAB structure variables was set forth. This approach to developing a set of spatial econometric estimation functions can provide a consistent user-interface for the function documentation and help information as well as encapsulation of the estimation results in a MATLAB structure variable. This construct can be accessed by related functions to provide printed and graphical presentation of the estimation results.
This chapter discusses in detail the spatial autoregressive models introduced in Chapter 1. A class of spatial autoregressive models has been introduced to model cross-sectional spatial data samples taking the form shown in (2.1) (Anselin, 1988).
Where y contains an nx1 vector of cross-sectional dependent variables and X represents an nxk matrix of explanatory variables. W1 and W2 are known nxn spatial weight matrices, usually containing first-order contiguity relations or functions of distance. As explained in Section 1.4.1, a first-order contiguity matrix has zeros on the main diagonal, rows that contain zeros in positions associated with non-contiguous observational units and ones in positions reflecting neighboring units that are (first-order) contiguous based on one of the contiguity definitions.
From the general model in (2.1) we can derive special models by imposing restrictions. For example, setting X=0 and W2=0 produces a first-order spatial autoregressive model shown in (2.2).
This model attempts to explain variation in y as a linear combination of
contiguous or neighboring units with no other explanatory variables. The model
is termed a first order spatial autoregression because its represents a spatial
analogy to the first order autoregressive model from time series analysis,
,
where total reliance is placed on the past period
observations to explain variation in yt.
Setting W2 = 0 produces a mixed regressive-spatial autoregressive model shown in (2.3). This model is analogous to the lagged dependent variable model in time series. Here we have additional explanatory variables in the matrix X that serve to explain variation in y over the spatial sample of observations.
Letting W1 = 0 results in a regression model with spatial autocorrelation in the disturbances as shown in (2.4).
This chapter is organized into sections that discuss and illustrate each of these special cases of the spatial autoregressive model as well as the most general model form in (2.1). Section 2.1 deals with the first-order spatial autoregressive model presented in (2.2). The mixed regressive-spatial autoregressive model is taken up in Section 2.2. Section 2.3 takes up the regression model containing spatial autocorrelation in the disturbances and illustrates various tests for spatial dependence using regression residuals. The most general model is the focus of Section 2.4. Applied illustrations of all the models are provided using a variety of spatial data sets. Spatial econometrics library functions that utilize MATLAB sparse matrix algorithms allow us to estimate models with over 3,000 observations in around 100 seconds on an inexpensive desktop computer.
This model is seldom used in applied work, but it serves to motivate some of the ideas that we draw on in later sections of the chapter. The model which we label FAR, takes the form:
where the spatial contiguity matrix W has been standardized to have row sums of unity and the variable vector y is expressed in deviations from the means form to eliminate the constant term in the model.
To illustrate the problem with least-squares estimation of
spatial autoregressive models, consider applying least-squares
to the model in (2.5) which would produce an
estimate for the single parameter
in the model:
Note that the least-squares estimate is biased, since we cannot show that
.
The usual argument that the explanatory variables matrix
X in least-squares is fixed in repeated sampling allows one to pass the
expectation operator over terms like
and argue that
,
eliminating the bias
term. Here however,
because of spatial dependence, we cannot make the case that Wy is fixed in
repeated sampling. This also rules out making a case for consistency of the
least-squares estimate of
,
because the probability limit (plim) of the
term
is not
zero. In fact, Anselin (1988) establishes that:
| (2.8) |
This is equal to zero only in the trivial case where
equals
zero and we have no spatial dependence in the data sample.
Given that least-squares will produce biased and inconsistent estimates of the
spatial autoregressive parameter
in this model, how do we proceed to
estimate
? The maximum likelihood estimator for
requires
that we find a value of
that maximizes the likelihood
function shown in (2.9).
In order to simplify the maximization problem, we obtain a concentrated log
likelihood function based on eliminating the parameter
for the
variance of the disturbances. This is accomplished by substituting
in the likelihood
(2.9) and taking logs which yields:
This expression can be maximized with respect to
using a simplex
univariate optimization routine. The estimate for the parameter
can be obtained using the value of
that maximizes the log-likelihood
function (say,
)
in:
.
In the next section, we discuss
a sparse matrix algorithm approach to evaluating this likelihood function
that allows us to solve problems involving thousands of observations
quickly with small amounts of computer memory.
Two implementation details arise with this approach to solving
for maximum likelihood estimates. First, there is a constraint
that we need to impose on the parameter
.
This
parameter can take on feasible values in the range (Anselin and Florax, 1994):
where
represents the minimum eigenvalue of the standardized
spatial contiguity matrix W and
denotes the largest eigenvalue
of this matrix. This suggests that we need to constrain our optimization
procedure search over values of
within this range.
The second implementation issue is that the numerical Hessian matrix that would
result from a gradient-based optimization procedure and provide estimates of
dispersion for the parameters is not available with simplex optimization. We
can overcome this problem in two ways. For problems involving a small number of
observations, we can use our knowledge of the (theoretical) information matrix to
produce estimates of dispersion. An asymptotic variance matrix based on the
Fisher information matrix shown below for the parameters
can be used to provide measures of dispersion for the estimates of
and
.
(see Anselin, 1980 page 50):
This approach is computationally impossible when dealing with large scale
problems involving thousands of observations. In these cases we can evaluate
the numerical hessian matrix using the maximum likelihood estimates of
and
as well as our sparse matrix function to compute the likelihood.
We will demonstrate results from using both of these approaches in the next section.
Building on the software design set forth in section 1.5 for our spatial econometrics function library, we have implemented a function far to produce maximum likelihood estimates for the first-order spatial autoregressive model. We rely on on the sparse matrix functionality of MATLAB so large-scale problems can be solved using a minimum of time and computer memory. We demonstrate this function using a data set involving 3,107 U.S. contiguous counties.
Estimating the FAR model requires that we find
eigenvalues for the large n by n matrix W, as well
as the determinant
of the related n by n matrix
.
In addition, matrix multiplications
involving W and
are required to compute the information
matrix used to produce estimates of dispersion.
We constructed a function far that can produce estimates for the first-order spatial autoregressive model in a case involving 3,107 observations in 95 seconds on a moderately fast inexpensive desktop computer. The MATLAB algorithms for dealing with sparse matrices make it ideally suited for spatial modeling because spatial weight matrices are almost always sparse.
Another issue we need to address is computing measures of dispersion for the
estimates
and
in large estimation problems. As already
noted, we cannot rely on the information matrix approach because this involves
matrix operations on very large matrices. An approach that we take to produce
measures of dispersion is to numerically evaluate the hessian matrix using the
maximum likelihood estimates of
and
.
The approach basically
produces a numerical approximation to the expression in (2.11). A key to
using this approach is the ability to evaluate the log likelihood function using
the sparse algorithms to handle large matrices.
It should be noted that Pace and Barry (1997) when confronted with the task of providing measures of dispersion for spatial autoregressive estimates based on sparse algorithms suggest using likelihood ratio tests to determine the significance of the parameters. The approach taken here may suffer from some numerical inaccuracy relative to measures of dispersion based on the theoretical information matrix, but has the advantage that users will be presented with traditional t-statistics on which they can base inferences.
We will have more to say about how our approach to solving large spatial autoregressive estimation problems using sparse matrix algorithms in MATLAB compares to one proposed by Pace and Barry (1997), when we apply the function far to a large data set in the next section.
Documentation for the function far is presented below. This function was written to perform on both large and small problems. If the problem is small (involving less than 500 observations), the function far computes measures of dispersion using the theoretical information matrix. If more observations are involved, the function determines these measures by computing a numerical Hessian matrix. (Users may need to decrease the number of observations to less than 500 if they have computers without a large amount of memory.)
PURPOSE: computes 1st-order spatial autoregressive estimates
y = p*W*y + e, using sparse matrix algorithms
---------------------------------------------------
USAGE: results = far(y,W,rmin,rmax,convg,maxit)
where: y = dependent variable vector
W = standardized contiguity matrix
rmin = (optional) minimum value of rho to use in search
rmax = (optional) maximum value of rho to use in search
convg = (optional) convergence criterion (default = 1e-8)
maxit = (optional) maximum # of iterations (default = 500)
---------------------------------------------------
RETURNS: a structure
results.meth = 'far'
results.rho = rho
results.tstat = asymptotic t-stat
results.yhat = yhat
results.resid = residuals
results.sige = sige = (y-p*W*y)'*(y-p*W*y)/n
results.rsqr = rsquared
results.lik = -log likelihood
results.nobs = nobs
results.nvar = nvar = 1
results.y = y data vector
results.iter = # of iterations taken
results.romax = 1/max eigenvalue of W (or rmax if input)
results.romin = 1/min eigenvalue of W (or rmin if input)
--------------------------------------------------
One option we provide allows the user to supply minimum and maximum values of
rather than rely on the eigenvalues of W. This might be used if we
wished to constrain the estimation results to a range of say
.
Note also that this would reduce the time needed to compute the maximum and
minimum eigenvalues of the large W matrix.
Given our function far that implements maximum likelihood estimation of small and large first-order spatial autoregressive models, we turn attention to illustrating the use of the function with some spatial data sets. In addition to the estimation functions, we have functions prt and plt that provide printed and graphical presentation of the estimation results.
Example 2.1 provides an illustration of using these functions to estimate a first-order spatial autoregressive model for neighborhood crime from the Anselin (1988) spatial data sample. Note that we convert the variable vector containing crime incidents to deviations from the means form.
% ----- Example 2.1 Using the far() function
load wmat.dat; % standardized 1st-order contiguity matrix
load anselin.dat; % load Anselin (1988) Columbus neighborhood crime data
y = anselin(:,1);
ydev = y - mean(y);
W = wmat;
vnames = strvcat('crime','rho');
res = far(ydev,W); % do 1st-order spatial autoregression
prt(res,vnames); % print the output
plt(res,vnames); % plot actual vs predicted and residuals
This example produced the following printed output with the graphical output
presented in Figure 2.1. From the output we would infer that a
distinct spatial dependence among the crime incidents for the sample of 49
neighborhoods exists since the parameter estimate for
has a t-statistic
of 4.259. We would interpret this statistic in the typical regression fashion to
indicate that the estimated
lies 4.2 standard deviations away from zero.
We also see that this model explains nearly 44% of the variation
in crime incidents in deviations from the means form.
First-order spatial autoregressive model Estimates Dependent Variable = crime R-squared = 0.4390 sigma^2 = 153.8452 Nobs, Nvars = 49, 1 log-likelihood = -373.44669 # of iterations = 17 min and max rho = -1.5362, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability rho 0.669775 4.259172 0.000095
Another more challenging example involves a large sample of 3,107 observations representing counties in the continental U.S. from Pace and Barry (1997). They examine presidential election results for this large sample of observations covering the U.S. presidential election of 1980 between Carter and Reagan. The variable we wish to explain using the first-order spatial autoregressive model is the proportion of total possible votes cast for both candidates. Only persons 18 years and older are eligible to vote, so the proportion is based on those voting for both candidates divided by the population over 18 years of age.
Pace and Barry (1997) suggest an alternative approach to that implemented here
in the function far. They propose overcoming the difficulty we face in
evaluating the determinant
by computing this determinant once over
a grid of values for the parameter
ranging from
to
prior to estimation. They suggest a grid based on 0.01
increments for
over the feasible range. Given these pre-determined
values for the determinant
,
they point out that one can quickly
evaluate the log-likelihood function for all values of
in the grid and
determine the optimal value of
as that which maximizes the likelihood
function value over this grid. Note that their proposed approach would involve
evaluating the determinant around 200 times if the feasible range of
was
-1 to 1. In many cases the range is even greater than this and would require
even more evaluations of the determinant. In contrast, our function far
reports that only 17 iterations requiring log likelihood function evaluations
involving the determinant were needed to solve for the estimates in the case of
the Columbus neighborhood crime data set.
In addition, consider that one might need to construct a finer grid around the
approximate maximum likelihood value of
determined from the initial grid
search, whereas our use of the MATLAB simplex algorithm produces an estimate
that is accurate to a number of decimal digits.
After some discussion of the computational savings associated with the use of sparse matrices, we illustrate the use of our function far and compare it to the approach suggested by Pace and Barry. A first point to note regarding sparsity is that large problems such as this will inevitably involve a sparse spatial contiguity weighting matrix. This becomes obvious when you consider the contiguity structure of our sample of 3,107 U.S. counties. At most, individual counties exhibited only 8 first-order (rook definition) contiguity relations, so the remaining 2,999 entries in this row are zero. The average number of contiguity relationships is 4, so a great many of the elements in the matrix W are zero, which is the definition of a sparse matrix.
To understand how sparse matrix algorithms conserve on storage space and computer memory, consider that we need only record the non-zero elements of a sparse matrix for storage. Since these represent a small fraction of the total 3107x3107 = 9,653,449 elements in the weight matrix, we save a tremendous amount of computer memory. In fact, for our example of the 3,107 U.S. counties, only 12,429 non-zero elements were found in the first-order spatial contiguity matrix, representing a very small fraction (far less than 1 percent) of the total elements.
MATLAB provides a function sparse that can be used to construct a large sparse matrix by simply indicating the row and column positions of non-zero elements and the value of the matrix element for these non-zero row and column elements. Continuing with our example, we can store the first-order contiguity matrix in a single data file containing 12,429 rows with 3 columns that take the form:
row column value
This represents a considerable savings in computational space when compared to storing a matrix containing 9,653,449 elements. A handy utility function in MATLAB is spy which allows one to produce a specially formatted graph showing the sparsity structure associated with sparse matrices. We demonstrate by executing spy(W) on our weight matrix W from the Pace and Barry data set, which produced the graph shown in Figure 2.2. As we can see from the figure, most of the non-zero elements reside near the diagonal.
As an example of storing a sparse first-order contiguity matrix, consider example 2.2 below that reads data from the file `ford.dat' in sparse format and uses the function sparse to construct a working spatial contiguity matrix W. The example also produces a graphical display of the sparsity structure using the MATLAB function spy.
% ----- Example 2.2 Using sparse matrix functions
load ford.dat; % 1st order contiguity matrix
% stored in sparse matrix form
ii = ford(:,1);
jj = ford(:,2);
ss = ford(:,3);
clear ford; % clear out the matrix to save RAM memory
W = sparse(ii,jj,ss,3107,3107);
clear ii; clear jj; clear ss; % clear out these vectors to save memory
spy(W);
To compare our function far with the approach proposed by Pace and Barry,
we implemented their approach and provide timing results.
We take a more efficient approach
to the grid search over values of the parameter
than suggested by Pace and
Barry. Rather than search over a large number of values for
,
we based
our search on a large increment of 0.1 for an initial grid of values covering
from
to
.
Given the determinant of
calculated using sparse matrix algorithms in MATLAB, we evaluated
the negative log likelihood function values for this grid of
values to
find the value that minimizes the likelihood function. We then make a second
pass based on a tighter grid with increments of 0.01 around the optimal
value found in the first pass. A third and final pass is based on an even finer
grid with increments of 0.001 around the optimal estimate from the second pass.
Note that we used a MATLAB sparse function eigs to solve for
the eigenvalues of the contiguity matrix W,
which requires 60 seconds to solve this part of
the problem as shown in the output below. The time necessary to perform each
pass over the grid of 21 values for
was around 10 seconds. With a total
of 3 passes to produce an estimate of
accurate to 3 decimal digits, we
have a total elapsed time of 1 minute and 30 seconds to solve for the maximum
likelihood estimate of
.
This is certainly a reasonable amount of
computational time for such a large problem on a reasonably inexpensive desktop
computing platform. Of course, there is still the problem of producing measures
of dispersion for the estimates that Pace and Barry address by suggesting the
use of likelihood ratio statistics.
elapsed_time = 59.8226 % computing min,max eigenvalues elapsed_time = 10.5280 % carrying out 1st 21-point grid over rho elapsed_time = 10.3791 % carrying out 2nd 21-point grid over rho elapsed_time = 10.3747 % carrying out 3rd 21-point grid over rho estimate of rho = 0.7220 estimate of sigma = 0.0054
How does our approach compare to that of Pace and Barry? Example 2.3 shows a program to estimate the same FAR model using our far function.
% ----- Example 2.3 Using the far() function % with very large data set from Pace and Barry load elect.dat; % load data on votes y = elect(:,7)./elect(:,8); % proportion of voters casting votes ydev = y - mean(y); % deviations from the means form clear y; % conserve on RAM memory clear elect; % conserve on RAM memory load ford.dat; % 1st order contiguity matrix stored in sparse matrix form ii = ford(:,1); jj = ford(:,2); ss = ford(:,3); n = 3107; clear ford; % clear ford matrix to save RAM memory W = sparse(ii,jj,ss,n,n); clear ii; clear jj; clear ss; % conserve on RAM memory tic; res = far(ydev,W); toc; prt(res);
In terms of time needed to solve the problem, our use of the simplex
optimization algorithm takes only 10.6 seconds to produce a more accurate
estimate than that based on the grid approach of Pace and Barry.
Their approach which we modified took 30
seconds to solve for a
value accurate to 3 decimal digits. Note also in
contrast to Pace and Barry, we compute a conventional measure of dispersion
using the numerical hessian estimates which takes only 1.76 seconds. The total
time required to compute not only the estimates and measures of dispersion for
and
,
but the R-squared statistics and log likelihood function
was around 100 seconds.
elapsed_time = 59.8226 % computing min,max eigenvalues elapsed_time = 10.6622 % time required for simplex solution of rho elapsed_time = 1.7681 % time required for hessian evaluation elapsed_time = 1.7743 % time required for likelihood evaluation total time = 74.01 % comparable time to Pace and Barry First-order spatial autoregressive model Estimates R-squared = 0.5375 sigma^2 = 0.0054 Nobs, Nvars = 3107, 1 log-likelihood = 1727.9824 # of iterations = 13 min and max rho = -1.0710, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability rho 0.721474 59.495159 0.000000
Many of the ideas developed in this section regarding the use of MATLAB sparse matrix algorithms will apply equally to the estimation procedures we develop in the next three sections for the other members of the spatial autoregressive model family.
This model extends the first-order spatial autoregressive model to include a matrix X of explanatory variables such as those used in traditional regression models. Anselin (1988) provides a maximum likelihood method for estimating the parameters of this model that he labels a `mixed regressive - spatial autoregressive model'. We will refer to this model as the spatial autoregressive model (SAR). The SAR model takes the form:
Where y contains an nx1 vector of dependent variables, X
represents the usual nxk data matrix containing explanatory variables
and W is a known spatial weight matrix, usually a first-order contiguity
matrix. The parameter
is a coefficient on the spatially lagged
dependent variable, Wy, and the parameters
reflect the influence of
the explanatory variables on variation in the dependent variable y. The
model is termed a mixed regressive - spatial autoregressive model because it
combines the standard regression model with a spatially lagged dependent
variable, reminiscent of the lagged dependent
variable model from time-series analysis.
Maximum likelihood estimation of this model is based on a concentrated
likelihood function as was the case with the FAR model. A few regressions are
carried out along with a univariate parameter optimization of the concentrated
likelihood function over values of the autoregressive parameter
.
The
steps are enumerated in Anselin (1988) as:
Again we face the problem that using a univariate simplex optimization
algorithm to find a maximum likelihood estimate of
based
on the concentrated log likelihood function leaves us with no
estimates of the dispersion associated with the parameters.
We can overcome this using the theoretical information
matrix for small problems and the numerical hessian
approach introduced for the FAR model in the case of large
problems.
Since this model is quite similar to the FAR model which we already
presented, we will turn immediately to describing the function.
The function sar is fairly similar to our far function, with the documentation presented below.
PURPOSE: computes spatial autoregressive model estimates
y = p*W*y + X*b + e, using sparse matrix algorithms
---------------------------------------------------
USAGE: results = sar(y,x,W,rmin,rmax,convg,maxit)
where: y = dependent variable vector
x = explanatory variables matrix
W = standardized contiguity matrix
rmin = (optional) minimum value of rho to use in search
rmax = (optional) maximum value of rho to use in search
convg = (optional) convergence criterion (default = 1e-8)
maxit = (optional) maximum # of iterations (default = 500)
---------------------------------------------------
RETURNS: a structure
results.meth = 'sar'
results.beta = bhat
results.rho = rho
results.tstat = asymp t-stat (last entry is rho)
results.yhat = yhat
results.resid = residuals
results.sige = sige = (y-p*W*y-x*b)'*(y-p*W*y-x*b)/n
results.rsqr = rsquared
results.rbar = rbar-squared
results.lik = -log likelihood
results.nobs = # of observations
results.nvar = # of explanatory variables in x
results.y = y data vector
results.iter = # of iterations taken
results.romax = 1/max eigenvalue of W (or rmax if input)
results.romin = 1/min eigenvalue of W (or rmin if input)
--------------------------------------------------
As in the case of far, we allow the user to provide
minimum and maximum values of
to use in the
search. This may save time in cases where we wish
to restrict our estimate of
to the positive range.
The other point to note is that this function also
uses the numerical hessian approach to compute measures
of dispersion for large problems involving more than
500 observations.
As an illustration of using the sar function, consider the program in example 2.4, where we estimate a model to explain variation in votes casts on a per capita basis in the 3,107 counties. The explanatory variables in the model were: the proportion of population with high school level education or higher, the proportion of the population that are homeowners and the income per capita. Note that the population deflater used to convert the variables to per capita terms was the population 18 years or older in the county.
% ----- Example 2.4 Using the sar() function with a very large data set
load elect.dat; % load data on votes in 3,107 counties
y = (elect(:,7)./elect(:,8)); % convert to per capita variables
x1 = log(elect(:,9)./elect(:,8)); % education
x2 = log(elect(:,10)./elect(:,8));% homeownership
x3 = log(elect(:,11)./elect(:,8));% income
n = length(y); x = [ones(n,1) x1 x2 x3];
clear x1; clear x2; clear x3;
clear elect; % conserve on RAM memory
load ford.dat; % 1st order contiguity matrix stored in sparse matrix form
ii = ford(:,1); jj = ford(:,2); ss = ford(:,3);
n = 3107;
clear ford; % clear ford matrix to save RAM memory
W = sparse(ii,jj,ss,n,n);
clear ii; clear jj; clear ss; % conserve on RAM memory
vnames = strvcat('voters','const','educ','homeowners','income');
to = clock;
res = sar(y,x,W);
etime(clock,to)
prt(res,vnames);
We use the MATLAB clock function as well as etime to determine the overall execution time needed to solve this problem, which was 130 seconds. The estimation results are presented below:
Spatial autoregressive Model Estimates Dependent Variable = voters R-squared = 0.6356 Rbar-squared = 0.6352 sigma^2 = 0.0143 Nobs, Nvars = 3107, 4 log-likelihood = 3159.4467 # of iterations = 11 min and max rho = -1.0710, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability const 0.649079 15.363781 0.000000 educ 0.254021 16.117196 0.000000 homeowners 0.476135 32.152225 0.000000 income -0.117354 -7.036558 0.000000 rho 0.528857 36.204637 0.000000
We see from the results that all of the explanatory variables exhibit a
significant effect on the variable we wished to explain. The results also
indicate that the dependent variable y exhibits strong spatial dependence even
after taking the effect of these variables into account as the
estimate of
on the spatial lagged variable is large
and significant.
As an illustration of the bias associated with least-squares estimation of spatial autoregressive models, we present an example based on a spatial sample of 88 observations for counties in the state of Ohio. A sample of average housing values for each of 88 counties in Ohio will be related to population per square mile, the housing density and unemployment rates in each county. This regression relationship can be written as:
The motivation for the regression relationship is that population and household density as well as unemployment rates work to determine the house values in each county. Consider that the advent of suburban sprawl and the notion of urban rent gradients suggests that housing values in contiguous counties should be related. The least-squares relationship in (2.13) ignores the spatial contiguity information whereas the SAR model would allow for this type of variation in the model.
The first task is to construct a spatial contiguity matrix for use with our spatial autoregressive model. This could be accomplished by examining a map of the 88 counties and recording neighboring tracts for every observation, a very tedious task. An alternative is to use the latitude and longitude coordinates to construct a contiguity matrix. We rely on a function xy2cont that carries out this task. This function is part of Pace and Barry's Spatial Statistics Toolbox for MATLAB, but has been modified to fit the documentation conventions of the spatial econometrics library. The function documentation is shown below:
PURPOSE: uses x,y coord to produce spatial contiguity weight matrices
with delaunay routine from MATLAB version 5.2
------------------------------------------------------
USAGE: [w1 w2 w3] = xy2cont(xcoord,ycoord)
where: xcoord = x-direction coordinate vector (nobs x 1)
ycoord = y-direction coordinate vector (nobs x 1)
------------------------------------------------------
RETURNS: w1 = W*W*S, a row-stochastic spatial weight matrix
w2 = W*S*W, a symmetric spatial weight matrix (max(eig)=1)
w3 = diagonal matrix with i,i equal to 1/sqrt(sum of ith row)
------------------------------------------------------
References: Kelley Pace, Spatial Statistics Toolbox 1.0
------------------------------------------------------
This function essentially uses triangles connecting the x-y coordinates in space to deduce contiguous entities. As an example of using the function, consider constructing a spatial contiguity matrix for the Columbus neighborhood crime data set where we know both the first-order contiguity structure taken from a map of the neighborhoods as well as the x-y coordinates. Here is a program to generate the first-order contiguity matrix from the latitude and longitude coordinates and produce a graphical comparison of the two contiguity structures shown in Figure 2.3. Note that the function spy does not place labels on the x and y axes in the graph since the matrix rows and columns are always reflected on these axes.
% ----- Example 2.5 Using the xy2cont() function
load anselin.data; % Columbus neighborhood crime
xc = anselin(:,5); % longitude coordinate
yc = anselin(:,4); % latitude coordinate
load Wmat.data; % load standardized contiguity matrix
% create contiguity matrix from x-y coordinates
[W1 W2 W3] = xy2cont(xc,yc);
% graphically compare the two
spy(W2,'ok'); hold on; spy(Wmat,'+k');
legend('generated','actual');
Example 2.6 reads in the data from two files containing a database for the 88 Ohio counties as well as data vectors containing the latitude and longitude information needed to construct a contiguity matrix. We rely on a log transformation of the dependent variable house values to provide better scaling for the data. Note the use of the MATLAB construct: `ohio2(:,5)./ohio1(:,2)', which divides every element in the column vector `ohio(:,5)' containing total households in each county by every element in the column vector `ohio1(:,2)', which contains the population for every county. This produces the number of households per capita for each county as an explanatory variable measuring household density.
% ----- Example 2.6 Least-squares bias
% demonstrated with Ohio county data base
load ohio1.dat; % 88 counties (observations)
% 10 columns
% col1 area in square miles
% col2 total population
% col3 population per square mile
% col4 black population
% col5 blacks as a percentage of population
% col6 number of hospitals
% col7 total crimes
% col8 crime rate per capita
% col9 population that are high school graduates
% col10 population that are college graduates
load ohio2.dat; % 88 counties
% 10 columns
% col1 income per capita
% col2 average family income
% col3 families in poverty
% col4 percent of families in poverty
% col5 total number of households
% col6 average housing value
% col7 unemployment rate
% col8 total manufacturing employment
% col9 manufacturing employment as a percent of total
% col10 total employment
load ohio.xy; % latitude-longitude coordinates of county centroids
[junk W junk2] = xy2cont(ohio(:,1),ohio(:,2)); % make W-matrix
y = log(ohio2(:,6)); n = length(y);
x = [ ones(n,1) ohio1(:,3) ohio2(:,5)./ohio1(:,2) ohio2(:,7) ];
vnames = strvcat('hvalue','constant','popsqm','housedensity','urate');
res = ols(y,x); prt(res,vnames);
res = sar(y,x,W); prt(res,vnames);
The results from these two regressions are shown below. The first point to note
is that the spatial autocorrelation coefficient estimate for the SAR model is
statistically significant, indicating the presence of spatial autocorrelation in
the regression relationship. Least-squares ignores this type of variation
producing estimates that lead us to conclude that all three explanatory
variables are significant in explaining housing values across the 88 county
sample. In contrast, the SAR model leads us to conclude that the population
density (popsqm) is not statistically
significant at conventional levels. Keep in mind that the OLS estimates are
biased and inconsistent, so the inference of significance from OLS we would draw
is likely to be incorrect.
Ordinary Least-squares Estimates Dependent Variable = hvalue R-squared = 0.6292 Rbar-squared = 0.6160 sigma^2 = 0.0219 Durbin-Watson = 2.0992 Nobs, Nvars = 88, 4 *************************************************************** Variable Coefficient t-statistic t-probability constant 11.996858 71.173358 0.000000 popsqm 0.000110 2.983046 0.003735 housedensity -1.597930 -3.344910 0.001232 urate -0.067693 -7.525022 0.000000 Spatial autoregressive Model Estimates Dependent Variable = hvalue R-squared = 0.7298 Rbar-squared = 0.7201 sigma^2 = 0.0153 Nobs, Nvars = 88, 4 log-likelihood = 87.284225 # of iterations = 13 min and max rho = -2.0158, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability constant 6.300144 35.621170 0.000000 popsqm 0.000037 1.196689 0.234794 housedensity -1.251435 -3.140028 0.002332 urate -0.055474 -7.387845 0.000000 rho 0.504131 53.749348 0.000000
A second point is that taking the spatial variation into account improves the fit of the model, raising the R-squared statistic for the SAR model. Finally, the magnitudes of the OLS parameter estimates indicate that house values are more sensitive to the household density and the unemployment rate variables than the SAR model. For example, the OLS estimates imply that a one percentage point increase in the unemployment rate leads to a decrease of 6.8 percent in house values whereas the SAR model places this at 5.5 percent. Similarly, the OLS estimates for household density is considerably larger in magnitude than that from the SAR model.
The point of this illustration is that ignoring information regarding the spatial configuration of the data observations will produce different inferences that may lead to an inappropriate model specification. Anselin and Griffith (1988) also provide examples and show that traditional specification tests are plagued by the presence of spatial autocorrelation, so that we should not rely on these tests in the presence of significant spatial autocorrelation.
Here we turn attention to the spatial errors model shown in (2.14), where the disturbances exhibit spatial dependence. Anselin (1988) provides a maximum likelihood method for this model which we label SEM here.
y contains an nx1 vector of dependent variables and X represents the
usual nxk data matrix containing explanatory variables. W is a known
spatial weight matrix and the parameter
is a coefficient on the
spatially correlated errors analogous to the serial correlation problem in time
series models. The parameters
reflect the influence of the explanatory
variables on variation in the dependent variable y.
We introduce a number of statistical tests that can be used to detect the presence of spatial autocorrelation in the residuals from a least-squares model. Use of these tests will be illustrated in the next section.
The first test for spatial dependence in the disturbances of a regression model is called Moran's I-statistic. If this test indicates spatial correlation in the least-squares residuals, the SEM model would be an appropriate way to proceed.
Moran's I-statistic takes two forms depending on whether the spatial weight matrix W is standardized or not.
where e represent regression residuals. Cliff and Ord (1972, 1973, 1981) show that the asymptotic distribution of Moran's I based on least-squares residuals corresponds to a standard normal distribution after adjusting the I-statistic by subtracting the mean and dividing by the standard deviation of the statistic. The adjustment takes two forms depending on whether W is standardized or not. (Anselin, 1988, page 102).
We implement this test in the MATLAB function moran, which takes a regression model and spatial weight matrix W as input and returns a structure variable containing the results from a Moran test for spatial correlation in the residuals. The prt function can be used to provide a formatted print out of the test results. The help documentation for the function is shown below.
PURPOSE: computes Moran's I-statistic for spatial correlation
in the residuals of a regression model
---------------------------------------------------
USAGE: result = moran(y,x,W)
where: y = dependent variable vector
x = independent variables matrix
W = contiguity matrix (standardized or unstandardized)
---------------------------------------------------
RETURNS: a structure variable
result.morani = e'*W*e/e'*e (I-statistic)
result.istat = [i - E(i)]/std(i), standardized version
result.imean = E(i), expectation
result.ivar = var(i), variance
result.prob = std normal marginal probability
result.nobs = # of observations
result.nvar = # of variables in x-matrix
---------------------------------------------------
NOTE: istat > 1.96, => small prob,
=> reject HO: of no spatial correlation
---------------------------------------------------
See also: prt(), lmerrors, walds, lratios
---------------------------------------------------
A number of other asymptotically valid approaches exist for testing whether spatial correlation is present in the residuals from a least-squares regression model. Some of these are the likelihood ratio test, the Wald test and a Lagrange multiplier test, all of which are based on maximum likelihood estimation of the SEM model.
The likelihood ratio test is based on the difference between the log likelihood
from the SEM model and the log likelihood from a least-squares regression. This
quantity represents a statistic that is distributed
.
A function lratios carries out this test and returns a results
structure which can be passed to the prt function for
presentation of the results. Documentation for the function is:
PURPOSE: computes likelihood ratio test for spatial
correlation in the errors of a regression model
---------------------------------------------------
USAGE: result = lratios(y,x,W)
or: result = lratios(y,x,W,sem_result);
where: y = dependent variable vector
x = independent variables matrix
W = contiguity matrix (standardized or unstandardized)
sem_result = a results structure from sem()
---------------------------------------------------
RETURNS: a structure variable
result.meth = 'lratios'
result.lratio = likelihood ratio statistic
result.chi1 = 6.635
result.prob = marginal probability
result.nobs = # of observations
result.nvar = # of variables in x-matrix
---------------------------------------------------
NOTES: lratio > 6.635, => small prob,
=> reject HO: of no spatial correlation
calling the function with a results structure from sem()
can save time for large models that have already been estimated
---------------------------------------------------
Note that we allow the user to supply a `results' structure variable from the sem estimation function, which would save the time needed to estimate the SEM model if the model has already been estimated. This could represent a considerable savings for large problems.
Another approach is based on a Wald test for residual spatial
autocorrelation. This test statistic (shown in (2.19)) is distributed
.
(Anselin, 1988, page 104).
Where
,
with the maximum likelihood
estimate of
used, and .* denotes element-by-element
matrix multiplication.
We have implemented a MATLAB function walds, that carries out this test. The function documentation is shown below:
PURPOSE: Wald statistic for spatial autocorrelation in
the residuals of a regression model
---------------------------------------------------
USAGE: result = walds(y,x,W)
where: y = dependent variable vector
x = independent variables matrix
W = contiguity matrix (standardized)
---------------------------------------------------
RETURNS: a structure variable
result.meth = 'walds'
result.wald = Wald statistic
result.prob = marginal probability
result.chi1 = 6.635
result.nobs = # of observations
result.nvar = # of variables
---------------------------------------------------
NOTE: wald > 6.635, => small prob,
=> reject HO: of no spatial correlation
---------------------------------------------------
See also: lmerror, lratios, moran
---------------------------------------------------
A fourth approach is the Lagrange Multiplier (LM) test which is based on the least-squares residuals and calculations involving the spatial weight matrix W. The LM statistic takes the form: (Anselin, 1988, page 104).
Where e denote least-squares residuals and again we use .* to denote element by element matrix multiplication.
This test is implemented in a MATLAB function lmerrors with the documentation for the function shown below.
PURPOSE: LM error statistic for spatial correlation in
the residuals of a regression model
---------------------------------------------------
USAGE: result = lmerror(y,x,W)
where: y = dependent variable vector
x = independent variables matrix
W = contiguity matrix (standardized)
---------------------------------------------------
RETURNS: a structure variable
result.meth = 'lmerror'
result.lm = LM statistic
result.prob = marginal probability
result.chi1 = 6.635
result.nobs = # of observations
result.nvar = # of variables
---------------------------------------------------
NOTE: lm > 6.635, => small prob,
=> reject HO: of no spatial correlation
---------------------------------------------------
See also: walds, lratios, moran
---------------------------------------------------
Finally, a test based on the residuals from the SAR
model can be used to examine whether inclusion of the spatial lag term eliminates spatial
dependence in the residuals of the model. This test differs from the four
tests outlined above in that we allow for the presence of the spatial lagged
variable Cy in the model. The test for spatial dependence is conditional on
having a
parameter not equal to zero in the model, rather than relying on
least-squares residuals as in the case of the other four tests.
One could view this test as based on the following model:
Where the focus of the test is on whether the parameter
.
This
test statistic is also a Lagrange Multiplier statistic based on (Anselin, 1988,
page 106):
where W is the spatial weight matrix shown in (2.21),
and
var(
)
is the maximum likelihood estimate of the variance
of the parameter
in the model.
We have implemented this test in a MATLAB function lmsar with the documentation for the function shown below.
PURPOSE: LM statistic for spatial correlation in the
residuals of a spatial autoregressive model
---------------------------------------------------
USAGE: result = lmsar(y,x,W1,W2)
where: y = dependent variable vector
x = independent variables matrix
W1 = contiguity matrix for rho
W2 = contiguity matrix for lambda
---------------------------------------------------
RETURNS: a structure variable
result.meth = 'lmsar'
result.lm = LM statistic
result.prob = marginal probability
result.chi1 = 6.635
result.nobs = # of observations
result.nvar = # of variables
---------------------------------------------------
NOTE: lm > 6.635, => small prob,
=> reject HO: of no spatial correlation
---------------------------------------------------
See also: walds, lratios, moran, lmerrors
---------------------------------------------------
It should be noted that a host of other methods to test for spatial dependence in various modeling situations have been proposed. In addition, the small sample properties of many alternative tests have been compared in Anselin and Florax (1994) and Anselin and Rey (1991). One point to consider is that many of the matrix computations required for these tests cannot be carried out with very large data samples. We discuss this issue and suggest alternative approaches in the examples of Section 2.3.2.
To estimate the spatial error model (SEM) we can draw on the
sparse matrix approach used for the FAR and SAR models. One approach to
estimating this model is based on an iterative approach that: 1) constructs
least-squares estimates and associated residuals, 2) finds a value of
that maximizes the log likelihood conditional on the least-squares
values and 3) updates the least-squares values of
using the value of
determined in step 2). The updated estimates of
can be
used to compute a new set of residuals allowing this process to
be iterated until convergence.
That is, until values for both the residuals and
fail to change from one iteration to the next.
Next, we present documentation for the function sem that carries out the iterative estimation process. This is quite similar in approach to the functions far and sar already described.
PURPOSE: computes spatial error model estimates
y = XB + u, u = L*W*u + e, using sparse algorithms
---------------------------------------------------
USAGE: results = sem(y,x,W,lmin,lmax,convg,maxit)
where: y = dependent variable vector
x = independent variables matrix
W = contiguity matrix (standardized)
lmin = (optional) minimum value of lambda to use in search
lmax = (optional) maximum value of lambda to use in search
convg = (optional) convergence criterion (default = 1e-8)
maxit = (optional) maximum # of iterations (default = 500)
---------------------------------------------------
RETURNS: a structure
results.meth = 'sem'
results.beta = bhat
results.lam = L (lambda)
results.tstat = asymp t-stats (last entry is lam)
results.yhat = yhat
results.resid = residuals
results.sige = sige = e'(I-L*W)'*(I-L*W)*e/n
results.rsqr = rsquared
results.rbar = rbar-squared
results.lik = log likelihood
results.nobs = nobs
results.nvar = nvars (includes lam)
results.y = y data vector
results.iter = # of iterations taken
results.lmax = 1/max eigenvalue of W (or lmax if input)
results.lmin = 1/min eigenvalue of W (or lmin if input)
--------------------------------------------------
It should be noted that an alternative approach to estimating this model would be to directly maximize the log likelihood function for this model using a general optimization algorithm. It might produce an improvement in speed, depending on how many likelihood function evaluations are needed when solving large problems. We provide an option for doing this in the function semo that relies on a MATLAB optimization function maxlik that is part of my Econometrics Toolbox software.
We provide examples of using the functions moran, lmerror, walds and lratios that test for spatial correlation in the least-squares residuals as well as lmsar to test for spatial correlation in the residuals of an SAR model. These examples are based on the Anselin neighborhood crime data set. It should be noted that computation of the Moran I-statistic, the LM error statistic, and the Wald test require matrix multiplications involving the large spatial weight matrices C and W. This is not true of the likelihood ratio statistic implemented in the function lratios. This test only requires that we compare the likelihood from a least-squares model to that from a spatial error model. As we can produce SEM estimates using our sparse matrix algorithms, this test can be implemented for large models.
Example 2.7 shows a program that carries out all of the tests for spatial correlation and estimates an SEM model.
% ----- Example 2.7 Testing for spatial correlation
load wmat.dat; % standardized 1st-order contiguity matrix
load anselin.dat; % load Anselin (1988) Columbus neighborhood crime data
y = anselin(:,1); nobs = length(y);
x = [ones(nobs,1) anselin(:,2:3)];
W = wmat;
vnames = strvcat('crime','const','income','house value');
res1 = moran(y,x,W);
prt(res1);
res2 = lmerror(y,x,W);
prt(res2);
res3 = lratios(y,x,W);
prt(res3);
res4 = walds(y,x,W);
prt(res4);
res5 = lmsar(y,x,W,W);
prt(res5);
res = sem(y,x,W);% do 1st-order spatial autoregression
prt(res,vnames); % print the output
Note that we have provided code in the prt function to provide a formatted printout of the test results from our spatial correlation testing functions. From the results printed below, we see that the least-squares residuals exhibit spatial correlation. We infer this from the small marginal probabilities that indicate significance at the 99% level of confidence. With regard to the LM error test for spatial correlation in the residuals of the SAR model implemented in the function lmsar, we see from the marginal probability of 0.565 that here we can reject any spatial dependence in the residuals from this model.
Moran I-test for spatial correlation in residuals Moran I 0.23610178 Moran I-statistic 2.95890622 Marginal Probability 0.00500909 mean -0.03329718 standard deviation 0.09104680 LM error tests for spatial correlation in residuals LM value 5.74566426 Marginal Probability 0.01652940 chi(1) .01 value 17.61100000 LR tests for spatial correlation in residuals LR value 8.01911539 Marginal Probability 0.00462862 chi-squared(1) value 6.63500000 Wald test for spatial correlation in residuals Wald value 14.72873758 Marginal Probability 0.00012414 chi(1) .01 value 6.63500000 LM error tests for spatial correlation in SAR model residuals LM value 0.33002340 Marginal Probability 0.56564531 chi(1) .01 value 6.63500000 Spatial error Model Estimates Dependent Variable = crime R-squared = 0.6515 Rbar-squared = 0.6364 sigma^2 = 95.5675 log-likelihood = -166.40057 Nobs, Nvars = 49, 3 # iterations = 12 min and max lam = -1.5362, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability const 59.878750 11.157027 0.000000 income -0.940247 -2.845229 0.006605 house value -0.302236 -3.340320 0.001667 lambda 0.562233 4.351068 0.000075
As an example of estimating an SEM model on a large data set, we use the Pace and Barry data set with the same model used to demonstrate the SAR estimation procedure.
% ----- Example 2.8 Using the sem() function with a very large data set
load elect.dat; % load data on votes in 3,107 counties
y = (elect(:,7)./elect(:,8)); % convert to per capita variables
x1 = log(elect(:,9)./elect(:,8)); % education
x2 = log(elect(:,10)./elect(:,8));% homeownership
x3 = log(elect(:,11)./elect(:,8));% income
n = length(y); x = [ones(n,1) x1 x2 x3];
clear x1; clear x2; clear x3;
clear elect; % conserve on RAM memory
load ford.dat; % 1st order contiguity matrix stored in sparse matrix form
ii = ford(:,1); jj = ford(:,2); ss = ford(:,3);
n = 3107;
clear ford; % clear ford matrix to save RAM memory
W = sparse(ii,jj,ss,n,n);
clear ii; clear jj; clear ss; % conserve on RAM memory
vnames = strvcat('voters','const','educ','homeowners','income');
to = clock;
res = sem(y,x,W);
etime(clock,to)
prt(res,vnames);
We computed estimates using both the iterative procedure implemented in the function sem and the optimization procedure implemented in the function semo. The time required for the optimization procedure was 338 seconds, which compared to 311 seconds for the iterative procedure. The optimization approach required only 5 function evaluations whereas the iterative procedure required 11 function evaluations. Both of these functions are part of the spatial econometrics library, as it may be the case that the optimization approach would produce estimates in less time than the iterative approach in some applications. This would likely be the case if very good initial estimates were available as starting values. We present the estimates from both approaches to demonstrate that they produce estimates that are identical to 3 decimal places.
% estimates from iterative approach using sem() function Spatial error Model Estimates Dependent Variable = voters R-squared = 0.6606 Rbar-squared = 0.6603 sigma^2 = 0.0133 log-likelihood = 3202.7211 Nobs, Nvars = 3107, 4 # iterations = 11 min and max lam = -1.0710, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability const 0.543129 8.769040 0.000000 educ 0.293303 12.065152 0.000000 homeowners 0.571474 36.435109 0.000000 income -0.152842 -6.827930 0.000000 lambda 0.650523 41.011556 0.000000 % estimates from optimization approach using semo() function Spatial error Model Estimates Dependent Variable = voters R-squared = 0.6606 Rbar-squared = 0.6603 sigma^2 = 0.0133 log-likelihood = 3202.7208 Nobs, Nvars = 3107, 4 # iterations = 5 *************************************************************** Variable Coefficient t-statistic t-probability const 0.543175 8.770178 0.000000 educ 0.293231 12.061955 0.000000 homeowners 0.571494 36.436805 0.000000 income -0.152815 -6.826670 0.000000 lambda 0.650574 41.019490 0.000000
The estimates from this model indicate that after taking into account the influence of the explanatory variables, we still have spatial correlation in the residuals of the model that can be modeled successfully with the SEM model. As a confirmation of this, consider that the LR test implemented with the function lratios produced the results shown below:
LR tests for spatial correlation in residuals LR value 1163.01773404 Marginal Probability 0.00000000 chi-squared(1) value 6.63500000
Recall that this is a test of spatial autocorrelation in the residuals from a least-squares model, and the test results provide a strong indication of spatial dependence in the least-squares residuals. Note also that this is the only test that can be implemented successfully with large data sets.
A reasonable alternative would be to simply estimate a FAR model using the least-squares residuals to test for the presence of spatial dependence in the errors. We illustrate this approach in Section 2.5.
A general version of the spatial model includes both the spatial lagged term as well as a spatially correlated error structure as shown in (2.23).
One point to note about this model is that W1 can equal W2, but there
may be identification problems in this case. The log likelihood for this model
can be maximized using our general optimization algorithm on a concentrated
version of the likelihood function. The parameters
and
are
concentrated out of the likelihood function, leaving the parameters
and
.
This eliminates the ability to use the univariate simplex
optimization algorithm fmin that we used with the other spatial
autoregressive models.
We can still produce a sparse matrix algorithm for the log likelihood function
and proceed in a similar fashion to that used for the other spatial
autoregressive models. One difference is that we cannot easily impose
restrictions on the parameters
and
to force them to lie within
the ranges defined by the maximum and minimum eigenvalues from their associated
weight matrices W1 and W2.
When might one rely on this model? If there were evidence that spatial dependence existed in the error structure from a spatial autoregressive (SAR) model, the SAC model is an appropriate approach to modeling this type of dependence in the errors. Recall, we can use the LM-test implemented in the function lmsars to see if spatial dependence exists in the residuals of an SAR model.
Another place where one might rely on this model is a case where a second-order spatial contiguity matrix was used for W2 that corresponds to a first-order contiguity matrix W1. This type of model would express the belief that the disturbance structure involved higher-order spatial dependence, perhaps due to second-round effects of a spatial phenomenon being modeled.
A third example of using matrices W1 and W2 might be where W1
represented a first-order contiguity matrix and W2 was constructed as a
diagonal matrix measuring the distance from the central city. This type of
configuration of the spatial weight matrices would indicate a belief that
contiguity alone does not suffice to capture the spatial effects at work. The
distance from the central city might also represent an important factor in the
phenomenon we are modeling. This raises the identification issue, should we use
the distance weighting matrix in place of W1 and the first-order contiguity
matrix for W2, or rely on the opposite configuration?
Of course, comparing likelihood function values along with the
statistical significance of the parameters
and
from models estimated using both configurations
might point to a clear answer.
The log likelihood function for this model is:
We concentrate the function using the following expressions
for
and
:
Given the expressions in (2.25), we can evaluate the log likelihood given
values of
and
.
The values of the other parameters
and
can be calculated as a function of the
parameters
and the sample data in y,X.
Documentation for the MATLAB function sac that carries out the non-linear optimization of the log likelihood function for this model is shown below. There are a number of things to note about this function. First, we provide optimization options for the user in the form of a structure variable `info'. These options allow the user to control some aspects of the maxlik optimization algorithm and to print intermediate results while optimization is proceeding.
This is the first example of a function that uses the MATLAB structure variable as an input argument. This allows us to provide a large number of input arguments using a single structure variable. Note that you can name the structure variable used to input the options anything you want to -- it is the fieldnames that the function sac parses to find the options.
PURPOSE: computes general Spatial Model
model: y = p*W1*y + X*b + u, u = lam*W2*u + e
---------------------------------------------------
USAGE: results = sac(y,x,W1,W2)
where: y = dependent variable vector
x = independent variables matrix
W1 = spatial weight matrix (standardized)
W2 = spatial weight matrix
info = a structure variable with optimization options
info.parm = (optional) 2x1 starting values for rho, lambda
info.convg = (optional) convergence criterion (default = 1e-7)
info.maxit = (optional) maximum # of iterations (default = 500)
info.method = 'bfgs', 'dfp' (default bfgs)
info.pflag = flag for printing of intermediate results
---------------------------------------------------
RETURNS: a structure
results.meth = 'sac'
results.beta = bhat
results.rho = p (rho)
results.lam = L (lambda)
results.tstat = asymptotic t-stats (last 2 are rho,lam)
results.yhat = yhat
results.resid = residuals
results.sige = sige = e'(I-L*W)'*(I-L*W)*e/n
results.rsqr = rsquared
results.rbar = rbar-squared
results.lik = likelihood function value
results.nobs = nobs
results.nvar = nvars
results.y = y data vector
results.iter = # of iterations taken
--------------------------------------------------
We take the same approach to optimization failure as we did with the sem
function. A message is printed to warn the user that optimization failed, but
we let the function continue to process and return a results structure
consisting of failed parameter estimates. This decision was made to allow the
user to examine the failed estimates and attempt estimation based on
alternative optimization options. For example, the user might elect to attempt
a Davidson-Fletcher-Powell (`dfp') algorithm in place of the default
Broyden-Fletcher-Goldfarb-Smith (`bfgs') routine or supply starting
values for the parameters
and
.
With regard to optimization algorithm failures, it should be noted that the Econometrics Toolbox contains alternative optimization functions that can be used in place of maxlik. Any of these functions could be substituted for maxlik in the function sac. Chapter 10 in the Econometrics Toolbox illustrates the use of these functions as well as their documentation. The next section illustrates use of the estimation functions we have constructed for the general spatial autoregressive model.
Our first example illustrates the general spatial model with the Anselin Columbus neighborhood crime data set. We construct a spatial lag matrix W2 for use in the model. As discussed in Chapter 1, higher-order spatial lags require that we eliminate redundancies that arise. Anselin and Smirnov (1994) provide details regarding the procedures as well as a comparison of alternative algorithms and their relative performance.
A function slag can be used to produce higher order spatial lags. The documentation for the function is:
PURPOSE: compute spatial lags
---------------------------------------------
USAGE: Wp = slag(W,p)
where: W = input spatial weight matrix, sparse or full
(0,1 or standardized form)
p = lag order (an integer)
---------------------------------------------
RETURNS: Wp = W^p spatial lag matrix
in standardized form if W standardized was input
in 0,1 form in W non-standardized was input
---------------------------------------------
One point about slag is that it returns a standardized contiguity matrix even if a non-standardized matrix is used as an input. This seemed a useful approach to take. There is a function normw that standardizes spatial weight matrices so the row-sums are unity. It takes a single input argument containing the non-standardized weight matrix and returns a single argument containing the standardized matrix.
Example 2.9 uses the sac function to estimate three alternative models. Our example illustrates the point discussed earlier regarding model specification with respect to the use of W and W2 by producing estimates for three models based on alternative configurations of these two spatial weight matrices.
% ----- Example 2.9 Using the sac function
load Wmat.dat; % standardized 1st-order contiguity matrix
load anselin.dat; % load Anselin (1988) Columbus neighborhood crime data
y = anselin(:,1); nobs = length(y);
x = [ones(nobs,1) anselin(:,2:3)];
W = Wmat;
vnames = strvcat('crime','const','income','house value');
W2 = slag(W,2); % standardized W2 result from slag
subplot(2,1,1), spy(W);
xlabel('First-order contiguity structure');
subplot(2,1,2), spy(W2);
xlabel('Second-order contiguity structure');
pause;
res1 = sac(y,x,W2,W);% general spatial model W2,W
prt(res1,vnames); % print the output
res2 = sac(y,x,W,W2);% general spatial model W,W2
prt(res2,vnames); % print the output
res3 = sac(y,x,W,W); % general spatial model W,W
prt(res3,vnames); % print the output
plt(res3);
The estimation results are shown below for all three versions of the
model. The first two models produced estimates that suggest W2
is not significant, as the coefficients for both
and
are small and insignificant when this contiguity matrix is applied.
In contrast, the first-order contiguity matrix is always associated
with a significant
or
coefficient in the first two
models, indicating the importance of first-order effects.
The third model that uses the first-order W for both
and
produced insignificant coefficients for both of these
parameters.
General Spatial Model Estimates Dependent Variable = crime R-squared = 0.6527 Rbar-squared = 0.6376 sigma^2 = 95.2471 log-likelihood = -165.36509 Nobs, Nvars = 49, 3 # iterations = 5 *************************************************************** Variable Coefficient t-statistic t-probability const 45.421239 6.863214 0.000000 income -1.042733 -3.226112 0.002313 house value -0.268027 -2.935739 0.005180 rho -0.094359 -0.392131 0.696773 lambda 0.429926 6.340856 0.000000 General Spatial Model Estimates Dependent Variable = crime R-squared = 0.6520 Rbar-squared = 0.6369 sigma^2 = 95.4333 log-likelihood = -166.39931 Nobs, Nvars = 49, 3 # iterations = 5 *************************************************************** Variable Coefficient t-statistic t-probability const 60.243770 4.965791 0.000010 income -0.937802 -3.005658 0.004281 house value -0.302261 -3.406156 0.001377 rho 0.565853 4.942206 0.000011 lambda -0.010726 -0.151686 0.880098 General Spatial Model Estimates Dependent Variable = crime R-squared = 0.6514 Rbar-squared = 0.6362 sigma^2 = 95.6115 log-likelihood = -165.25612 Nobs, Nvars = 49, 3 # iterations = 7 *************************************************************** Variable Coefficient t-statistic t-probability const 47.770500 4.338687 0.000078 income -1.024966 -3.119166 0.003127 house value -0.281714 -3.109463 0.003213 rho 0.167197 0.497856 0.620957 lambda 0.368187 1.396173 0.169364
By way of summary, I would reject all three SAC model specifications,
thinking that the SAR or SEM models (presented below) appear
preferable. Note that the second SAC model specification collapses to an
SAR model by virtue of the fact that the parameter
is not
significant and the parameter
in this model is associated
with the first-order weight matrix W.
Spatial error Model Estimates Dependent Variable = crime R-squared = 0.6515 Rbar-squared = 0.6364 sigma^2 = 95.5675 log-likelihood = -166.40057 Nobs, Nvars = 49, 3 # iterations = 12 min and max lam = -1.5362, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability const 59.878750 11.157027 0.000000 income -0.940247 -2.845229 0.006605 house value -0.302236 -3.340320 0.001667 lambda 0.562233 4.351068 0.000075 Spatial autoregressive Model Estimates Dependent Variable = crime R-squared = 0.6518 Rbar-squared = 0.6366 sigma^2 = 95.5033 Nobs, Nvars = 49, 3 log-likelihood = -165.41269 # of iterations = 11 min and max rho = -1.5362, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability const 45.056482 6.186276 0.000000 income -1.030647 -3.369256 0.001533 house value -0.265970 -3.004718 0.004293 rho 0.431377 3.587351 0.000806
An LM error test for spatial correlation in the residuals of the SAR model confirms that there is no spatial dependence in the residuals of this model. The LM error test results are shown below and they would lead us to conclude that the SAR model adequately captures spatial dependence in this data set.
LM error tests for spatial correlation in SAR model residuals LM value 0.33002340 Marginal Probability 0.56564531 chi(1) .01 value 6.63500000
An important point regarding any non-linear optimization problem such as that involved in the SAC model is that the estimates may not reflect global solutions. A few different solutions of the optimization problem based on alternative starting values is usually undertaken to confirm that the estimates do indeed represent global solutions to the problem. The function sac allows the user to input alternative starting values, making this relatively easy to do.
A final example uses the large Pace and Barry data set to illustrate the sac function in operation on large problems. Example 2.10 turns on the printing flag so we can observe intermediate results from the optimization algorithm as it proceeds.
% ----- Example 2.10 Using sac() on a large data set
load elect.dat; % load data on votes in 3,107 counties
y = log(elect(:,7)./elect(:,8)); % convert to per capita variables
x1 = log(elect(:,9)./elect(:,8)); % education
x2 = log(elect(:,10)./elect(:,8));% homeownership
x3 = log(elect(:,11)./elect(:,8));% income
n = length(y); x = [ones(n,1) x1 x2 x3];
clear x1; clear x2; clear x3;
clear elect; % conserve on RAM memory
load ford.dat; % 1st order contiguity matrix stored in sparse matrix form
ii = ford(:,1); jj = ford(:,2); ss = ford(:,3);
n = 3107;
clear ford; % clear ford matrix to save RAM memory
W = sparse(ii,jj,ss,n,n); W2 = slag(W,2);
clear ii; clear jj; clear ss; % conserve on RAM memory
vnames = strvcat('voters','const','educ','homeowners','income');
to = clock; info.pflag = 1;
res = sac(y,x,W,W2,info);
etime(clock,to)
prt(res,vnames);
The results are shown below, including intermediate results that were displayed by setting `info.plag=1'. It took 535 seconds to solve this problem involving 5 iterations of the maxlik function. This function tends to be faster than the alternative optimization algorithms available in the Econometrics Toolbox.
==== Iteration ==== 2
log-likelihood bconvergence fconvergence
7635.7620 0.2573 0.0017
Parameter Estimates Gradient
Parameter 1 0.4210 -232.2699
Parameter 2 0.4504 -162.1438
==== Iteration ==== 3
log-likelihood bconvergence fconvergence
7635.6934 0.0304 0.0000
Parameter Estimates Gradient
Parameter 1 0.4163 -2.3017
Parameter 2 0.4591 13.0082
==== Iteration ==== 4
log-likelihood bconvergence fconvergence
7635.6920 0.0052 0.0000
Parameter Estimates Gradient
Parameter 1 0.4151 -1.8353
Parameter 2 0.4601 0.4541
==== Iteration ==== 5
log-likelihood bconvergence fconvergence
7635.6920 0.0001 0.0000
Parameter Estimates Gradient
Parameter 1 0.4150 -0.0772
Parameter 2 0.4601 -0.0637
General Spatial Model Estimates
Dependent Variable = voters
R-squared = 0.6653
Rbar-squared = 0.6650
sigma^2 = 0.0131
log-likelihood = 3303.143
Nobs, Nvars = 3107, 4
# iterations = 5
***************************************************************
Variable Coefficient t-statistic t-probability
const 0.683510 13.257563 0.000000
educ 0.247956 12.440953 0.000000
homeowners 0.555176 35.372538 0.000000
income -0.117151 -5.858600 0.000000
rho 0.415024 16.527947 0.000000
lambda 0.460054 17.827407 0.000000
From the estimation results we see evidence that a general spatial
model might be appropriate for this modeling problem. The parameters
and
are both statistically significant.
In addition, the fit of this model is slightly
better than that of the SAR and SEM models (see examples 2.4 and
2.8) as indicated by a slightly higher adjusted
R-squared value and the likelihood function value is also slightly higher.
In a well-known paper, Harrison and Rubinfeld (1978) used a housing data set for the Boston SMSA with 506 observations (one observation per census tract) containing 14 variables. They were interested in the housing demand and clean air issues. We will use this spatial data set to illustrate specification and testing for the spatial autoregressive models presented in this chapter. Our dependent variable is median housing prices for each of the 506 census tracts. The explanatory variables in the dataset are listed below. Some of the explanatory variables do not vary by town rather than census tract since data for these variables was only available at this level of geographical delineation.
The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978. Variables in order: CRIM per capita crime rate by town ZN proportion of residential land zoned for lots over 25,000 sq.ft. INDUS proportion of non-retail business acres per town CHAS Charles River dummy (= 1 if tract bounds river; 0 otherwise) NOX nitric oxides concentration (parts per 10 million) RM average number of rooms per dwelling AGE proportion of owner-occupied units built prior to 1940 DIS weighted distances to five Boston employment centers RAD index of accessibility to radial highways TAX full-value property-tax rate per $10,000 PTRATIO pupil-teacher ratio by town B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town LSTAT percent lower status of the population MEDV Median value of owner-occupied homes in $1000's
Belsley, Kuh, and Welch (1980) used the data to examine the effects of robust estimation and published the observations in an appendix on pages 244-261. It should be noted that they published data that included various transformations, so the data in their appendix does not match our data which are in the raw untransformed format. Pace (1993), Gilley and Pace (1996), and Pace and Gilley (1997) have used this data set with spatial econometric models, and they added longitude-latitude coordinates for census tracts to the dataset. Our regression model will simply relate the median house values to all of the explanatory variables, simplifying our specification task. We will focus on alternative spatial specifications and models.
The next task involves some scaling and standardization issues surrounding the data set. Belsley, Kuh and Welsch (1980) used this data set to illustrate numerically ill-conditioned data that contained outliers and influential observations. Poor scaling will adversely impact our numerical Hessian approach to determining the variance-covariance structure of the spatial autoregressive parameter estimates. Intuitively, the hessian function attempts to compute a numerical derivative by perturbing each parameter in turn and examining the impact on the likelihood function. If the parameters vary widely in terms of magnitudes because the data are poorly scaled, this task will be more difficult and we may calculate negative variances.
Example 2.11 demonstrates the nature of these scaling problems, carrying out a least-squares regression. We see that the coefficient estimates vary widely in magnitude, so we scale the variables in the model using a function studentize from the Econometric Toolbox that subtracts the means and divides by the standard deviations. Another least-squares regression is then carried out to illustrate the impact of scaling on the model coefficients.
% ----- Example 2.11 Least-squares on the Boston dataset
load boston.raw; % Harrison-Rubinfeld data
[n k] = size(boston);y = boston(:,k); % median house values
x = [ones(n,1) boston(:,1:k-1)]; % other variables
vnames = strvcat('hprice','constant','crime','zoning','industry', ...
'charlesr','noxsq','rooms2','houseage','distance', ...
'access','taxrate','pupil/teacher','blackpop','lowclass');
res = ols(y,x); prt(res,vnames);
ys = studentize(y); xs = studentize(x(:,2:k));
res2 = ols(ys,xs);
vnames2 = strvcat('hprice','crime','zoning','industry','charlesr', ...
'noxsq','rooms2','houseage','distance','access','taxrate', ...
'pupil/teacher','blackpop','lowclass');
prt(res2,vnames2);
% sort actual and predicted by housing values from low to high
yhat = res2.yhat; [ysort yi] = sort(ys); yhats = yhat(yi,1);
tt=1:n; % plot actual vs. predicted
plot(tt,ysort,'ok',tt,yhats,'+k');
ylabel('housing values');
xlabel('census tract observations');
The results indicate that the coefficient estimates based on the unscaled data vary widely in magnitude from 0.000692 to 36.459, whereas the scaled variables produce coefficients ranging from 0.0021 to -0.407.
Ordinary Least-squares Estimates (non-scaled variables) Dependent Variable = hprice R-squared = 0.7406 Rbar-squared = 0.7338 sigma^2 = 22.5179 Durbin-Watson = 1.2354 Nobs, Nvars = 506, 14 *************************************************************** Variable Coefficient t-statistic t-probability constant 36.459488 7.144074 0.000000 crime -0.108011 -3.286517 0.001087 zoning 0.046420 3.381576 0.000778 industry 0.020559 0.334310 0.738288 charlesr 2.686734 3.118381 0.001925 noxsq -17.766611 -4.651257 0.000004 rooms2 3.809865 9.116140 0.000000 houseage 0.000692 0.052402 0.958229 distance -1.475567 -7.398004 0.000000 access 0.306049 4.612900 0.000005 taxrate -0.012335 -3.280009 0.001112 pupil/teacher -0.952747 -7.282511 0.000000 blackpop 0.009312 3.466793 0.000573 lowclass -0.524758 -10.347146 0.000000 Ordinary Least-squares Estimates (scaled variables) Dependent Variable = hprice R-squared = 0.7406 Rbar-squared = 0.7343 sigma^2 = 0.2657 Durbin-Watson = 1.2354 Nobs, Nvars = 506, 13 *************************************************************** Variable Coefficient t-statistic t-probability crime -0.101017 -3.289855 0.001074 zoning 0.117715 3.385011 0.000769 industry 0.015335 0.334650 0.738032 charlesr 0.074199 3.121548 0.001905 noxsq -0.223848 -4.655982 0.000004 rooms2 0.291056 9.125400 0.000000 houseage 0.002119 0.052456 0.958187 distance -0.337836 -7.405518 0.000000 access 0.289749 4.617585 0.000005 taxrate -0.226032 -3.283341 0.001099 pupil/teacher -0.224271 -7.289908 0.000000 blackpop 0.092432 3.470314 0.000565 lowclass -0.407447 -10.357656 0.000000
The program in example 2.11 also produces a plot of the actual versus predicted values from the model sorted by housing values from low to high. From this plot (shown in Figure 2.4), we see large predicted errors for the highest housing values. This suggests a log transformation on the dependent variable y in the model would be appropriate. The figure also illustrates that housing values above $50,000 (the last 16 observations at the right of the graph) have been censored to a value of $50,000, a subject we take up in Chapter 5.
We adopt a model based on the scaled data and a log transformation for the dependent variable and carry out least-squares estimation again in example 2.12. As a test for spatial autocorrelation in the least-squares residuals, we employ a first-order spatial autoregressive model on the residuals. We also carry out a Moran's I test for spatial autocorrelation, which may or may not work depending on how much RAM memory you have in your computer. Note that we can directly use the prt function without first returning the results from moran to a results structure. We rely on our function xy2cont to generate a spatial contiguity matrix needed by far and moran to test for spatial autocorrelation.
% ----- Example 2.12 Testing for spatial correlation
load boston.raw; % Harrison-Rubinfeld data
load latitude.data; load longitude.data;
[W1 W W3] = xy2cont(latitude,longitude); % create W-matrix
[n k] = size(boston);y = boston(:,k); % median house values
x = boston(:,1:k-1); % other variables
vnames = strvcat('hprice','crime','zoning','industry','charlesr', ...
'noxsq','rooms2','houseage','distance','access','taxrate', ...
'pupil/teacher','blackpop','lowclass');
ys = studentize(log(y)); xs = studentize(x);
res = ols(ys,xs); prt(res,vnames);
resid = res.resid; % recover residuals
rmin = 0; rmax = 1;
res2 = far(resid,W,rmin,rmax); prt(res2);
prt(moran(ys,xs,W));
The results shown below indicate that we have strong evidence of spatial autocorrelation in the residuals from the least-squares model. Our FAR model produced a spatial correlation coefficient estimate of 0.647 with a large t-statistic and this indication of spatial autocorrelation in the residuals is confirmed by the Moran test results.
Ordinary Least-squares Estimates Dependent Variable = hprice R-squared = 0.7896 Rbar-squared = 0.7845 sigma^2 = 0.2155 Durbin-Watson = 1.0926 Nobs, Nvars = 506, 13 *************************************************************** Variable Coefficient t-statistic t-probability crime -0.216146 -7.816230 0.000000 zoning 0.066897 2.136015 0.033170 industry 0.041401 1.003186 0.316263 charlesr 0.062690 2.928450 0.003564 noxsq -0.220667 -5.096402 0.000000 rooms2 0.156134 5.435516 0.000000 houseage 0.014503 0.398725 0.690269 distance -0.252873 -6.154893 0.000000 access 0.303919 5.377976 0.000000 taxrate -0.258015 -4.161602 0.000037 pupil/teacher -0.202702 -7.316010 0.000000 blackpop 0.092369 3.850718 0.000133 lowclass -0.507256 -14.318127 0.000000 (Test for spatial autocorrelation using FAR model) First-order spatial autoregressive model Estimates R-squared = 0.3085 sigma^2 = 0.1452 Nobs, Nvars = 506, 1 log-likelihood = -911.39591 # of iterations = 9 min and max rho = 0.0000, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability rho 0.647624 14.377912 0.000000 Moran I-test for spatial correlation in residuals Moran I 0.35833634 Moran I-statistic 14.70009315 Marginal Probability 0.00000000 mean -0.01311412 standard deviation 0.02526858
Example 2.13 carries out a series of alternative spatial autoregressive models in an attempt to specify the most appropriate model.
% ----- Example 2.13 Spatial autoregressive model estimation
load boston.raw; % Harrison-Rubinfeld data
load latitude.data; load longitude.data;
[W1 W W3] = xy2cont(latitude,longitude); % create W-matrix
[n k] = size(boston);y = boston(:,k); % median house values
x = boston(:,1:k-1); % other variables
vnames = strvcat('hprice','crime','zoning','industry','charlesr', ...
'noxsq','rooms2','houseage','distance','access','taxrate', ...
'pupil/teacher','blackpop','lowclass');
ys = studentize(log(y)); xs = studentize(x);
rmin = 0; rmax = 1;
tic; res1 = sar(ys,xs,W,rmin,rmax); prt(res1,vnames); toc;
tic; res2 = sem(ys,xs,W,rmin,rmax); prt(res2,vnames); toc;
tic; res3 = sac(ys,xs,W,W); prt(res3,vnames); toc;
The results from example 2.13 are presented below. We see that all three models
produced estimates indicating significant spatial autocorrelation. For example,
the SAR model produced a coefficient estimate for
equal to 0.4508 with a
large t-statistic, and the SEM model produced an estimate for
of
0.7576 that was also significant. The SAC model produced estimates of
and
that were both significant at the 99% level.
Which model is best? The log-likelihood function values are much higher for the SEM and SAC models, so this would be evidence against the SAR model. A further test of the SAR model would be to use the function lmsar that tests for spatial autocorrelation in the residuals of the SAR model. If we find evidence of residual spatial autocorrelation, it suggests that the SAC model might be most appropriate. Note that the SAC model exhibits the best log-likelihood function value.
Spatial autoregressive Model Estimates Dependent Variable = hprice R-squared = 0.8421 Rbar-squared = 0.8383 sigma^2 = 0.1576 Nobs, Nvars = 506, 13 log-likelihood = -85.099051 # of iterations = 9 min and max rho = 0.0000, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability crime -0.165349 -6.888522 0.000000 zoning 0.080662 3.009110 0.002754 industry 0.044302 1.255260 0.209979 charlesr 0.017156 0.918665 0.358720 noxsq -0.129635 -3.433659 0.000646 rooms2 0.160858 6.547560 0.000000 houseage 0.018530 0.595675 0.551666 distance -0.215249 -6.103520 0.000000 access 0.272237 5.625288 0.000000 taxrate -0.221229 -4.165999 0.000037 pupil/teacher -0.102405 -4.088484 0.000051 blackpop 0.077511 3.772044 0.000182 lowclass -0.337633 -10.149809 0.000000 rho 0.450871 12.348363 0.000000 Spatial error Model Estimates Dependent Variable = hprice R-squared = 0.8708 Rbar-squared = 0.8676 sigma^2 = 0.1290 log-likelihood = -58.604971 Nobs, Nvars = 506, 13 # iterations = 10 min and max lam = 0.0000, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability crime -0.186710 -8.439402 0.000000 zoning 0.056418 1.820113 0.069348 industry -0.000172 -0.003579 0.997146 charlesr -0.014515 -0.678562 0.497734 noxsq -0.220228 -3.683553 0.000255 rooms2 0.198585 8.325187 0.000000 houseage -0.065056 -1.744224 0.081743 distance -0.224595 -3.421361 0.000675 access 0.352244 5.448380 0.000000 taxrate -0.257567 -4.527055 0.000008 pupil/teacher -0.122363 -3.839952 0.000139 blackpop 0.129036 4.802657 0.000002 lowclass -0.380295 -10.625978 0.000000 lambda 0.757669 19.133467 0.000000 General Spatial Model Estimates Dependent Variable = hprice R-squared = 0.8662 Rbar-squared = 0.8630 sigma^2 = 0.1335 log-likelihood = -55.200525 Nobs, Nvars = 506, 13 # iterations = 7 *************************************************************** Variable Coefficient t-statistic t-probability crime -0.198184 -8.766862 0.000000 zoning 0.086579 2.824768 0.004923 industry 0.026961 0.585884 0.558222 charlesr -0.004154 -0.194727 0.845687 noxsq -0.184557 -3.322769 0.000958 rooms2 0.208631 8.573808 0.000000 houseage -0.049980 -1.337513 0.181672 distance -0.283474 -5.147088 0.000000 access 0.335479 5.502331 0.000000 taxrate -0.257478 -4.533481 0.000007 pupil/teacher -0.120775 -3.974717 0.000081 blackpop 0.126116 4.768082 0.000002 lowclass -0.374514 -10.707764 0.000000 rho 0.625963 9.519920 0.000000 lambda 0.188257 3.059010 0.002342
The results from the lmsar test shown below indicate the presence of spatial autocorrelation in the residuals of the SAR model, suggesting that the SAC model would be appropriate.
LM error tests for spatial correlation in SAR model residuals LM value 60.37309581 Marginal Probability 0.00000000 chi(1) .01 value 6.63500000
We would conclude that the SAC model is most appropriate here. Regarding inferences, an interesting point is that the Charles River location dummy variable was statistically significant in the least-squares version of the model, but not in any of the three spatial autoregressive models. Intuitively, taking explicit account of the spatial nature of the data eliminates the need for this locational dummy variable. Other differences in the inferences that would be made from least-squares versus the SAC model center on the magnitudes of `pupil/teacher' ratio and `lower class' population variables. The least-squares estimates for these two variables are roughly twice the magnitude of those from the SAC model. Since the other two spatial autoregressive models produce similar estimates for these two variables, we would infer that the least-squares estimates for these coefficients are exhibiting upward bias.
To a lesser extent, we would draw a different inference regarding the magnitude of impact for the `rooms2' variable from the two spatial autoregressive models that we think most appropriate (SEM and SAC) than least-squares. The SEM and SAC models produce estimates around 0.2 compared to a value of 0.156 from least-squares. Interestingly, the SAR model estimate for this variable is 0.160, close to that from least-squares.
We used this applied data set to explore the accuracy of the numerical hessian versus information matrix approach to determining the variance-covariance matrix for the estimates. Since this is a reasonably large dataset with a large number of explanatory variables, it should provide a good indication of how well the numerical hessian approach does. The t-statistics from the SAR, SEM and SAC models estimated with both the information matrix approach and the numerical hessian are presented below.
11.2 sec. 46.1 sec. 79.7 sec. 58.8 sec. 170.1 sec. 114.6 sec.
SAR(info) SAR(hess) SEM(info) SEM(hess) SAC(info) SAC(hess)
-6.763498 -6.888522 -8.474183 -8.439402 -8.718561 -8.766862
3.007078 3.009110 1.820262 1.820113 3.266408 2.824768
1.255180 1.255260 -0.003584 -0.003579 0.750766 0.585884
0.923109 0.918665 -0.690453 -0.678562 -0.227615 -0.194727
-3.397977 -3.433659 -3.687834 -3.683553 -4.598758 -3.322769
6.521229 6.547560 8.345316 8.325187 8.941594 8.573808
0.593747 0.595675 -1.756358 -1.744224 -1.597997 -1.337513
-6.037894 -6.103520 -3.435146 -3.421361 -7.690499 -5.147088
5.577032 5.625288 5.478342 5.448380 6.906236 5.502331
-4.147236 -4.165999 -4.527097 -4.527055 -4.985237 -4.533481
-4.025972 -4.088484 -3.886484 -3.839952 -4.765931 -3.974717
3.744277 3.772044 4.808276 4.802657 5.971414 4.768082
-9.900943 -10.149809 -10.954441 -10.625978 -11.650367 -10.707764
10.351829 12.348363 20.890074 19.133467 24.724209 9.519920
3.747132 3.059010
The numerical approach appears to work very well, producing identical inferences
for all coefficients. The SAC model produced the greatest divergence between
the t-statistics and unfortunately a large discrepancy exists for the spatial
parameters
and
in this model. Nonetheless, we would draw the
same inferences regarding the significance of these parameters from both
approaches to computing measures of dispersion. Pace and Barry (1998) express
the belief that the information matrix approach (whether computed numerically or
analytically) may not work well for spatial autoregressive models because of the
asymmetry of the profile likelihood for these models that arises from the
restricted range of the parameter
.
Additionally, they argue that the
necessary ``smoothness'' needed to compute well-behaved second derivatives may
not always occur as Ripley (1988) documents for a particular case. Given this,
we should perhaps qualify our evaluation of success regarding the
variance-covariance calculations. On the other hand, for this particular data
set and model, the variance-covariance structure for the spatial autoregressive
models is remarkably similar to that from the least-squares model as indicated
by the similar magnitudes for the t-statistics. Further, for the single case
of the Charles River dummy variable where the least-squares and spatial
t-statistics differed, we have a plausible explanation for this difference.
The times (in seconds) required by both information matrix and numerical hessian approaches to estimating the variance-covariance structure of the parameters are reported, and we see that the information matrix approach was quite a bit faster for the SAR model but slower for the other two models. One point to consider regarding the comparative execution times is that the spatial contiguity weight matrix is not exceptionally sparse for this problem. Of the (506x506)=256,036 elements, there are 3,006 non-zero entries which is 1.17 percent of the elements. This would bias upward the times reported for the numerical hessian approach since the sparse algorithms can't work to their fullest capability.
By way of concluding this illustration, we might estimate an SAC model based on a second order spatial contiguity matrix in place of the first-order matrix used in example 2.13. We can also produce FAR model estimates using the residuals from the SAC model in example 2.13 as well as the new model based on a second-order spatial weight matrix to check for second order spatial effects in the residuals of our model. Example 2.14 implements these final checks.
% ----- Example 2.14 Final model checking
load boston.raw; % Harrison-Rubinfeld data
load latitude.data; load longitude.data;
[W1 W W3] = xy2cont(latitude,longitude); % create W-matrix
[n k] = size(boston);y = boston(:,k); % median house values
x = boston(:,1:k-1); % other variables
vnames = strvcat('hprice','crime','zoning','industry','charlesr', ...
'noxsq','rooms2','houseage','distance','access','taxrate', ...
'pupil/teacher','blackpop','lowclass');
ys = studentize(log(y)); xs = studentize(x);
rmin = 0; rmax = 1;
W2 = W*W;
res = sac(ys,xs,W,W2); prt(res,vnames);
res2 = sac(ys,xs,W,W);
resid = res2.resid; % recover SAC residuals
prt(far(resid,W2,rmin,rmax));
The results shown below indicate that the SAC model using a second-order spatial weight matrix produces a slightly lower likelihood function value than the SAC model in example 2.13, but estimates that are reasonably similar in all other regards.
General Spatial Model Estimates Dependent Variable = hprice R-squared = 0.8766 Rbar-squared = 0.8736 sigma^2 = 0.1231 log-likelihood = -56.71359 Nobs, Nvars = 506, 13 # iterations = 7 *************************************************************** Variable Coefficient t-statistic t-probability crime -0.195223 -8.939992 0.000000 zoning 0.085436 2.863017 0.004375 industry 0.018200 0.401430 0.688278 charlesr -0.008227 -0.399172 0.689939 noxsq -0.190621 -3.405129 0.000715 rooms2 0.204352 8.727857 0.000000 houseage -0.056388 -1.551836 0.121343 distance -0.287894 -5.007506 0.000001 access 0.332325 5.486948 0.000000 taxrate -0.245156 -4.439223 0.000011 pupil/teacher -0.109542 -3.609570 0.000338 blackpop 0.127546 4.896422 0.000001 lowclass -0.363506 -10.368125 0.000000 rho 0.684424 12.793448 0.000000 lambda 0.208597 2.343469 0.019502 First-order spatial autoregressive model Estimates R-squared = 0.0670 sigma^2 = 0.1245 Nobs, Nvars = 506, 1 log-likelihood = -827.5289 # of iterations = 9 min and max rho = 0.0000, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability rho 0.188278 1.420560 0.156062
The residuals from the SAC model in example 2.13 show no significant second-order spatial autocorrelation, since the FAR model estimate is not statistically significant.
To further explore this model as an exercise, you might consider replacing the W2 weight matrix in example 2.14 with a matrix based on distances. See if this variant of the SAC model is successful. Another exercise would be to estimate a model using: sac(ys,xs,W2,W), and compare it to the model in example 2.14.
We have seen that spatial autoregressive models can be estimated using univariate and bivariate optimization algorithms to solve for estimates by maximizing the likelihood function. The sparse matrix routines in MATLAB allow us to write functions that evaluate the log likelihood function for large models rapidly and with a minimum of computer RAM memory. This approach was used to construct a library of estimation functions that were illustrated on a problem involving all 3,107 counties in the continental U.S. on an inexpensive desktop computer.
In addition to providing functions that estimate these models, the use of a general software design allowed us to provide both printed and graphical presentation of the estimation results.
Another place where we produced functions that can be used in spatial econometric analysis was in the area of testing for spatial dependence in the residuals from least-squares models and SAR models. Functions were devised to implement Moran's I-statistic as well as likelihood ratio and Lagrange multiplier tests for spatial autocorrelation in the residuals from least-squares and SAR models. These tests are a bit more hampered by large-scale data sets, but alternative approaches based on using the FAR model on residuals or likelihood ratio tests can be used.
This chapter discusses spatial autoregressive models from a Bayesian perspective. It is well-known that Bayesian regression methods implemented with diffuse prior information can replicate maximum likelihood estimation results. We demonstrate this type of application, but focus on some extensions that are available with the Bayesian approach. The maximum likelihood estimation methods set forth in the previous chapter are based on the presumption that the underlying disturbance process involved in generating the model is normally distributed.
Further, many of the formal tests for spatial dependence and heterogeneity, including those introduced in the previous chapter, rely on characteristics of quadratic forms for normal variates in order to derive the asymptotic distribution of the test statistics.
There is a history of Bayesian literature that deals with heteroscedastic and leptokurtic disturbances, treating these two phenomena in a similar fashion. Geweke (1993) points out that a non-Bayesian regression methodology introduced by Lange, Little and Taylor (1989), which assume an independent Student-t distribution for the regression disturbances, is identical to a Bayesian heteroscedastic linear regression model he introduces. Lange, Little and Taylor (1989) show that their regression model provides robust results in a wide range of applied data settings.
Geweke (1993) argues the same for his method and makes a connection to the Bayesian treatment of symmetric leptokurtic disturbance distributions through the use of scale mixtures of normal distributions. We adopt the approach of Geweke (1993) in order to extend the spatial autoregressive models introduced in Chapter 2.
The extended version of the model is:
Where the change made to the basic model is in the assumption regarding the
disturbances
.
We assume that they exhibit non-constant variance,
taking on different values for every observation. The magnitudes
represent parameters to be estimated. This assumption of
inherent spatial heterogeneity seems more appropriate than the traditional
Gauss-Markov assumption that the variance of the disturbance terms is constant
over space.
The first section of this chapter introduces a Bayesian heteroscedastic regression model and the topic of Gibbs sampling estimation without complications introduced by the spatial autoregressive model. The next section applies these ideas to the simple FAR model and implements a Gibbs sampling estimation procedure for this model. Following sections deal with the other spatial autoregressive models that we introduced in the previous chapter.
We consider the case of a heteroscedastic linear regression model with an informative prior that can be written as in (3.2).
Where y is an nx1 vector of dependent variables and X represents the
nxk matrix of explanatory variables. We assume that
is an
nx1 vector of normally distributed random variates with non-constant
variance. We place a normal prior on the parameters
and a diffuse prior
on
.
The relative variance terms
,
are assumed
fixed but unknown parameters that need to be estimated. The thought of
estimating n parameters,
,
in addition to the k+1
parameters,
using n data observations seems problematical.
Bayesian methods don't encounter the same degrees of freedom constraints,
because we can rely on an informative prior for these parameters. This prior
distribution for the vi terms will take the form of an independent
distribution. Recall that the
distribution is a single
parameter distribution, where we have represented this parameter as r. This
allows us to estimate the additional n vi parameters in the
model by adding the single
parameter r to our estimation procedure.
This type of prior has been used by Lindley (1971) for cell variances in an
analysis of variance problem, and Geweke (1993) in modeling heteroscedasticity
and outliers. The specifics regarding the prior assigned to the vi terms
can be motivated by considering that the prior mean equals unity and the
variance of the prior is 2/r. This implies that as r becomes very large,
the terms vi will all approach unity, resulting in V=In, the
traditional Gauss-Markov assumption. We will see that the role of
is to protect against outliers and observations containing large variances by
placing less weight on these observations. Large r values are associated with a prior
belief that outliers and non-constant variances do not exist.
Now consider the posterior distribution from which we would derive our
estimates. Following the usual Bayesian methodology, we would combine the
likelihood function for our simple model with the prior distributions for
,
and V to arrive at the posterior. There is little use in
doing this as we produce a complicated function that is not amenable to
analysis. As an alternative, consider the conditional
distributions for the parameters
and V. These distributions
are those that
would arise from assuming each of the other parameters were known. For example,
the conditional distribution for
assuming that we knew
and V would looks as follows:
Note that this is quite analogous to a generalized least-squares (GLS) version of the Theil and Goldberger (1961) estimation formulas, known as the ``mixed estimator''. Consider also that this would be fast and easy to compute.
Next consider the conditional distribution for the parameter
assuming that we knew the parameters
and
V in the problem. This distribution would be:
Where we let
.
This result parallels the simple
regression case where we know that the residuals are distributed as
.
A difference from the standard case is that we adjust the ei using the
relative variance terms vi.
Finally, the conditional distribution for the parameters Vrepresent a
distribution with r+1 degrees of freedom
(see Geweke, 1993):
The Gibbs sampling approach that we will use throughout this chapter to estimate Bayesian variants of the spatial autoregressive models is based on this simple idea. We specify the conditional distributions for all of the parameters in the model and proceed to carry out random draws from these distributions until we collect a large sample of parameter draws. Gelfand and Smith (1990) demonstrate that Gibbs sampling from the sequence of complete conditional distributions for all parameters in the model produces a set of draws that converge in the limit to the true (joint) posterior distribution of the parameters. That is, despite the use of conditional distributions in our sampling scheme, a large sample of the draws can be used to produce valid posterior inferences about the mean and moments of the multivariate posterior parameter distribution.
The method is most easily described by developing and implementing a Gibbs sampler for our heteroscedastic Bayesian regression model. Given the three conditional posterior densities in (3.3) through (3.5), we can formulate a Gibbs sampler for this model using the following steps:
These steps constitute a single pass of the Gibbs sampler. We wish to make a
large number of passes to build up a sample
of j
values from which we can approximate the posterior distributions for our
parameters.
To illustrate this approach in practice, consider example 3.1 which shows the MATLAB code for carry out estimation using the Gibbs sampler set forth above. In this example, we generate a regression model data set that contains a heteroscedastic set of disturbances based on a time trend variable. Only the last 50 observations in the generated data sample contain non-constant variances. This allows us to see if the estimated vi parameters detect this pattern of non-constant variance over the last half of the sample.
The generated data set used values of unity for
,
the intercept
term, and the two slope parameters,
and
.
The prior means
for the
parameters were set to the true values of unity with prior
variances of unity, reflecting a fair amount of uncertainty. The following
MATLAB program implements the Gibbs sampler for this model. Note how easy
it is to implement the mathematical equations in MATLAB code.
% ----- Example 3.1 Heteroscedastic Gibbs sampler
n=100; k=3; % set number of observations and variables
x = randn(n,k); b = ones(k,1); % generate data set
tt = ones(n,1); tt(51:100,1) = [1:50]';
y = x*b + randn(n,1).*sqrt(tt); % heteroscedastic disturbances
ndraw = 1100; nomit = 100; % set the number of draws
bsave = zeros(ndraw,k); % allocate storage for results
ssave = zeros(ndraw,1);
vsave = zeros(ndraw,n);
c = [1.0 1.0 1.0]'; % prior b means
R = eye(k); T = eye(k); % prior b variance
Q = chol(inv(T)); q = Q*c;
b0 = x\y; % use ols starting values
sige = (y-x*b0)'*(y-x*b0)/(n-k);
V = ones(n,1); in = ones(n,1); % initial value for V
rval = 4; % initial value for rval
qpq = Q'*Q; qpv = Q'*q; % calculate Q'Q, Q'q only once
tic; % start timing
for i=1:ndraw; % Start the sampling
ys = y.*sqrt(V); xs = matmul(x,sqrt(V));
xpxi = inv(xs'*xs + sige*qpq);
b = xpxi*(xs'*ys + sige*qpv); % update b
b = norm_rnd(sige*xpxi) + b; % draw MV normal mean(b), var(b)
bsave(i,:) = b'; % save b draws
e = ys - xs*b; ssr = e'*e; % update sige
chi = chis_rnd(1,n); % do chisquared(n) draw
sige = ssr/chi; ssave(i,1) = sige; % save sige draws
chiv = chis_rnd(n,rval+1); % update vi
vi = ((e.*e./sige) + in*rval)./chiv;
V = in./vi; vsave(i,:) = vi'; % save the draw
end; % End the sampling
toc; % stop timing
bhat = mean(bsave(nomit+1:ndraw,:)); % calculate means and std deviations
bstd = std(bsave(nomit+1:ndraw,:)); tstat = bhat./bstd;
smean = mean(ssave(nomit+1:ndraw,1));
vmean = mean(vsave(nomit+1:ndraw,:));
tout = tdis_prb(tstat',n); % compute t-stat significance levels
% set up for printing results
in.cnames = strvcat('Coefficient','t-statistic','t-probability');
in.rnames = strvcat('Variable','variable 1','variable 2','variable 3');
in.fmt = '%16.6f'; tmp = [bhat' tstat' tout];
fprintf(1,'Gibbs estimates \n'); % print results
mprint(tmp,in);
fprintf(1,'Sigma estimate = %16.8f \n',smean);
result = theil(y,x,c,R,T); % compare to Theil-Goldberger estimates
prt(result); plot(vmean); % plot vi-estimates
title('mean of vi-estimates');
We rely on MATLAB functions norm_rnd and chis_rnd to provide the multivariate normal and chi-squared random draws. These functions are part of the Econometrics Toolbox and are discussed in Chapter 8 of the manual. Note also, we omit the first 100 draws at start-up to allow the Gibbs sampler to achieve a steady state before we begin sampling for the parameter distributions.
The results are shown below, where we find that it took only 11.5 seconds to
carry out the 1100 draws and produce a sample of 1000 draws on which we can base
our posterior inferences regarding the parameters
and
.
For
comparison purposes, we produced estimates using the theil function from
the Econometrics Toolbox that implements mixed estimation. These
estimates are similar, but the t-statistics are smaller because they
suffer from the heteroscedasticity. Our Gibbs sampled estimates
take this into account increasing the precision of the estimates.
elapsed_time =
11.5534
Gibbs estimates
Variable Coefficient t-statistic t-probability
variable 1 1.071951 2.784286 0.006417
variable 2 1.238357 3.319429 0.001259
variable 3 1.254292 3.770474 0.000276
Sigma estimate = 10.58238737
Theil-Goldberger Regression Estimates
R-squared = 0.2316
Rbar-squared = 0.2158
sigma^2 = 14.3266
Durbin-Watson = 1.7493
Nobs, Nvars = 100, 3
***************************************************************
Variable Prior Mean Std Deviation
variable 1 1.000000 1.000000
variable 2 1.000000 1.000000
variable 3 1.000000 1.000000
***************************************************************
Posterior Estimates
Variable Coefficient t-statistic t-probability
variable 1 1.056404 0.696458 0.487808
variable 2 1.325063 0.944376 0.347324
variable 3 1.293844 0.928027 0.355697
Figure 3.1 shows the mean of the 1,000 draws for the parameters vi plotted for the 100 observation sample. Recall that the last 50 observations contained a time-trend generated pattern of non-constant variance. This pattern was detected quite accurately by the estimated vi terms.
One point that should be noted about Gibbs sampling estimation is that convergence of the sampler needs to be diagnosed by the user. The Econometrics Toolbox provides a set of convergence diagnostic functions along with illustrations of their use in Chapter 5 of the manual. Fortunately, for simple regression models (and spatial autoregressive models) convergence of the sampler is usually certain, and convergence occurs quite rapidly. A simple approach to testing for convergence is to run the sampler once to carry out a small number of draws, say 300 to 500, and a second time to carry out a larger number of draws, say 1000 to 2000. If the means and variances for the posterior estimates are similar from both runs, convergence seems assured.
In this section we turn attention to a Gibbs sampling approach for the FAR model that can accommodate heteroscedastic disturbances and outliers. Note that the presence of a few spatial outliers due to enclave effects or other aberrations in the spatial sample will produce a violation of normality in small samples. The distribution of disturbances will take on a fat-tailed or leptokurtic shape. This is precisely the type of problem that the heteroscedastic modeling approach of Geweke (1993) based on Gibbs sampling estimation was designed to address.
The Bayesian extension of the FAR model takes the form:
where as in Chapter 2, the spatial contiguity matrix W has been
standardized to have row sums of unity and the variable vector y is expressed
in deviations from the means to eliminate the constant term in the model.
We allow for an informative prior on the spatial autoregressive parameter
,
the heteroscedastic control parameter r and the disturbance variance
.
This is the most general Bayesian model, but practitioners would
probably implement diffuse priors for
and
.
A diffuse prior for
would be implemented by setting the prior mean c to
zero and using a large prior variance for T, say 1e+12. To implement a diffuse
prior for
we would set
.
The prior for r is
based on a
distribution which has a mean equal to m/k and a
variance equal to m/k2. Recall our discussion of the role of the prior
hyperparameter r in allowing the vi estimates to deviate from their prior
means of unity. Small values for r around 2 to 7 allow for non-constant
variance and are associated with a prior belief that outliers or non-constant
variances exist. Large values such as r=20 or r=50 would produce vi
estimates that are all close to unity, forcing the model to take on a
homoscedastic character and producing estimates equivalent to those from the
maximum likelihood FAR model discussed in Chapter 2. This would
make little sense -- if we wished to produce maximum likelihood estimates, it
would be much quicker to use the far function from Chapter 2.
In the heteroscedastic regression model demonstrated in example 3.1, we set
r=4 to allow ample opportunity for the vi parameters to deviate from
unity. Note that in example 3.1, the vi estimates for the first 50
observations were all close to unity despite our prior setting of r=4. We
will provide examples that suggest an optimal strategy for setting r is to
use small values in the range from 2 to 7. If the sample data exhibit
homoscedastic disturbances that are free from outliers, the vi estimates
will reflect this fact. On the other hand, if there is evidence of
heterogeneity in the errors, these settings for the hyperparameter r will
allow the vi estimates to deviate substantially from unity. Estimates for
the vi parameters that deviate from unity are needed to produce an
adjustment in the estimated
and
that take
non-constant variance into account and protect our estimates in the
face of outliers.
Econometric estimation problems amenable to Gibbs sampling can take one of two forms. The simplest case is where all of the conditional distributions are from well-known distributions allowing us to sample random deviates using standard computational algorithms. This was the case with our heteroscedastic Bayesian regression model.
A second more complicated case that one sometimes encounters in Gibbs sampling is where one or more of the conditional distributions can be expressed mathematically, but they take an unknown form. It is still possible to implement a Gibbs sampler for these models using a host of alternative methods that are available to produce draws from distributions taking non-standard forms.
One of the more commonly used ways to deal with this situation is known as the
`Metropolis algorithm'. It turns out that the FAR model falls into this latter
category requiring us to rely on what is known as a Metropolis-within-Gibbs
sampler. To see how this problem arises, consider the conditional distributions
for the FAR model parameters where we rely on diffuse priors,
and
for the parameters
shown in (3.7).
These priors can be combined with the likelihood for this model producing a
joint posterior distribution for the parameters,
.
If we treat
as known, the kernel for the conditional posterior (that part of the
distribution that ignores inessential constants) for
given
takes the form:
where
.
It is important to note that by
conditioning on
(treating it as known) we can subsume the determinant,
,
as part of the constant of proportionality, leaving us with
one of the standard distributional forms. From (3.9) we conclude that
.
Unfortunately, the conditional distribution of
given
takes the
following non-standard form:
To sample from (3.10) we rely on a method called `Metropolis sampling'. Because this takes place within the Gibbs sampling sequence, it is often labeled `Metropolis-within-Gibbs'.
Metroplis sampling is described here for the case of a symmetric normal
candidate generating density. This should work well for the conditional
distribution of
because, as Figure 3.2 shows, a
conditional distribution of
is similar to a normal
distribution with the same mean value.
The figure also shows a t-distribution with 3 degrees
of freedom, which would also work well in this application.
To describe Metropolis sampling in general, suppose we are interested in sampling from a density f() and x0 denotes the current draw from f. Let the candidate value be generated by y=x0+cZ, where Z is a draw from a standard normal distribution and c is a known constant. (If we wished to rely on a t-distribution, we could simply replace Z with a random draw from the t-distribution.)
An acceptance
probability is computed using:
.
We then draw a
uniform random deviate we label U, and if U < p, the next draw from f is
given by x1=y. If on the other hand,
,
the draw is taken to be
the current value,
x1=x0.
A MATLAB program to implement this approach for the case of the homoscedastic first-order spatial autoregressive (FAR) model is shown in example 3.2. We use this simple case as an introduction before turning to the heteroscedastic case. An implementation issue is that we need to impose the restriction:
where
and
are the minimum and maximum eigenvalues of the
standardized spatial weight matrix W. We impose this restriction using an approach that
has been labeled `rejection sampling'. Restrictions such as this, as well as non-linear
restrictions, can be imposed on the parameters during Gibbs sampling by simply rejecting
values that do not meet the restrictions (see Gelfand, Hills, Racine-Poon and Smith,
1990).
% ----- Example 3.2 Metropolis within Gibbs sampling FAR model
n=49; ndraw = 1100; nomit = 100; nadj = ndraw-nomit;
% generate data based on a given W-matrix
load wmat.dat; W = wmat; IN = eye(n); in = ones(n,1); weig = eig(W);
lmin = 1/min(weig); lmax = 1/max(weig); % bounds on rho
rho = 0.7; % true value of rho
y = inv(IN-rho*W)*randn(n,1); ydev = y - mean(y); Wy = W*ydev;
% set starting values
rho = 0.5; % starting value for the sampler
sige = 10.0; % starting value for the sampler
c = 0.5; % for the Metropolis step (adjusted during sampling)
rsave = zeros(nadj,1); % storage for results
ssave = zeros(nadj,1); rtmp = zeros(nomit,1);
iter = 1; cnt = 0;
while (iter <= ndraw); % start sampling;
e = ydev - rho*Wy; ssr = (e'*e); % update sige;
chi = chis_rnd(1,n); sige = (ssr/chi);
% metropolis step to get rho update
rhox = c_rho(rho,sige,ydev,W); % c_rho evaluates conditional
rho2 = rho + c*randn(1); accept = 0;
while accept == 0; % rejection bounds on rho
if ((rho2 > lmin) & (rho2 < lmax)); accept = 1; end;
rho2 = rho + c*randn(1); cnt = cnt+1;
end; % end of rejection for rho
rhoy = c_rho(rho2,sige,ydev,W); % c_rho evaluates conditional
ru = unif_rnd(1,0,1); ratio = rhoy/rhox; p = min(1,ratio);
if (ru < p)
rho = rho2;
end;
rtmp(iter,1) = rho;
if (iter >= nomit);
if iter == nomit % update c based on initial draws
c = 2*std(rtmp(1:nomit,1));
end;
ssave(iter-nomit+1,1) = sige; rsave(iter-nomit+1,1) = rho;
end; % end of if iter > nomit
iter = iter+1;
end; % end of sampling loop
% print-out results
fprintf(1,'hit rate = %6.4f \n',ndraw/cnt);
fprintf(1,'mean and std of rho %6.3f %6.3f \n',mean(rsave),std(rsave));
fprintf(1,'mean and std of sig %6.3f %6.3f \n',mean(ssave),std(ssave));
% maximum likelihood estimation for comparison
res = far(ydev,W);
prt(res);
Rejection sampling is implemented in the example with the following code
fragment that examines the candidate draws in `rho2' to see if they are in
the feasible range. If `rho2' is not in the feasible range, another
candidate value `rho2' is drawn and we increment a counter variable
`cnt' to keep track of how many candidate values are found outside the feasible
range. The `while loop' continues to draw new candidate values and examine
whether they are in the feasible range until we find a candidate value within
the limits. Finding this value terminates the `while loop'. This approach
ensures than any values of
that are ultimately accepted as draws will
meet the constraints.
% metropolis step to get rho update rho2 = rho + c*randn(1); accept = 0; while accept == 0; % rejection bounds on rho if ((rho2 > lmin) & (rho2 < lmax)); accept = 1; end; rho2 = rho + c*randn(1); cnt = cnt+1; end; % end of rejection for rho
Another point to note about the example is that we adjust the parameter `c' used to produce the random normal candidate values. This is done by using the initial nomit=100 values of `rho' to compute a new value for `c' based on two standard deviations of the initial draws. The following code fragment carries this out, where the initial `rho' draws have been stored in a vector `rtmp'.
if iter == nomit % update c based on initial draws
c = 2*std(rtmp(1:nomit,1));
end;
Consider also, that we delay collecting our sample of draws for the parameters
and
until we have executed `nomit' burn-in draws, which is 100
in this case. This allows the sampler to settle into a steady state, which
might be required if poor values of
and
were used to initialize
the sampler. In theory, any arbitrary values can be used, but a choice of good
values will speed up convergence of the sampler. A plot of the first 100 values
drawn from this example is shown in Figure 3.3. We used
and
as starting values despite our knowledge that the
true values were
and
.
The plots of the first 100
values indicate that even if we start with very poor values, far from the true
values used to generate the data, only a few iterations are required to reach a
steady state. This is usually true for regression-based Gibbs samplers.
The function c_rho evaluates the conditional distribution for
given
at any value of
.
Of course, we could use sparse
matrix algorithms in this function to handle large data sample problems,
which is the way we approach this task when constructing
our function far_g for the spatial econometrics library.
function yout = c_rho(rho,sige,y,W) % evaluates conditional distribution of rho % given sige for the spatial autoregressive model n = length(y); IN = eye(n); B = IN - rho*W; detval = log(det(B)); epe = y*B'*B*y; yout = (n/2)*log(sige) + (n/2)*log(epe) - detval;
Finally, we present results from executing the code shown in example 3.2, where both Gibbs
estimates based on the mean of the 1,000 draws for
and
as well as the
standard deviations are shown. For contrast, we present maximum likelihood estimates,
which for the case of the homoscedastic Gibbs sampler implemented here with a diffuse
prior on
and
should produce similar estimates.
Gibbs sampling estimates hit rate = 0.3561 mean and std of rho 0.649 0.128 mean and std of sig 1.348 0.312 First-order spatial autoregressive model Estimates R-squared = 0.4067 sigma^2 = 1.2575 Nobs, Nvars = 49, 1 log-likelihood = -137.94653 # of iterations = 13 min and max rho = -1.5362, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability rho 0.672436 4.298918 0.000084
The time needed to generate 1,100 draws was around 10 seconds, which represents 100 draws per second. We will see that similar speed can be achieved even for large data samples.
From the results we see that the mean and standard deviations from the Gibbs sampler
produce estimates very close to the maximum likelihood estimates, and to the true values
used to generate the model data. The mean estimate for
divided by the
standard deviation of 0.128 implies a t-statistic of 5.070, which is very close to the
maximum likelihood t-statistic. The estimate of
based on the mean
from 1,000 draws is also very close to the true value of unity used to generate the model.
The reader should keep in mind that we do not advocate using the Gibbs sampler in place of maximum likelihood estimation. That is, we don't really wish to implement a homoscedastic version of the FAR model Gibbs sampler that relies on diffuse priors. We turn attention to the more general heteroscedastic case that allows for either diffuse or informative priors in the next section.
We discuss some of the implementation details concerned with constructing a MATLAB function far_g to produce estimates for the Bayesian FAR model. This function will rely on a sparse matrix algorithm approach to handle problems involving large data samples. It will also allow for diffuse or informative priors and handle the case of heterogeneity in the disturbance variance.
The first thing we need to consider is that to produce a large number of draws,
say 1,000, we would need to evaluate the conditional distribution of
2,000 times. (Note that we called this function twice in example 3.2). Each evaluation
would require that we compute the determinant of the matrix
,
which we
have already seen is a non-trivial task for large data samples. To avoid this, we rely on
the Pace and Barry (1997) approach discussed in the previous chapter. Recall that they
suggested evaluating this determinant over a grid of values in the feasible range of
once at the outset. Given that we have carried out this evaluation and stored
the determinant values along with associated
values, we can simply ``look-up'' the
appropriate determinant in our function that evaluates the conditional distribution. That
is, the call to the conditional distribution function will provide a value of
for
which we need to evaluate the conditional distribution. If we already know the
determinant for a grid of all feasible
values, we can simply look up the
determinant value closest to the
value and use it during evaluation of the
conditional distribution. This saves us the time involved in computing the determinant
twice for each draw of
.
Since we need to carry out a large number of draws, this approach works better than computing determinants for every draw. Note that in the case of maximum likelihood estimation from Chapter 2, the opposite was true. There we only needed 10 to 20 evaluations of the likelihood function, making the initial grid calculation approach of Pace and Barry much slower.
Now we turn attention to the function far_g that implements the Gibbs
sampler for the FAR model. The documentation for the function is shown below.
You can of course examine the code in the function, but it essentially carries
out the approach set forth in example 3.2, with modifications to allow for the
relative variance parameters vi and informative priors for
and r in this extended version of the model.
PURPOSE: Gibbs sampling estimates of the 1st-order Spatial
model: y = rho*W*y + e, e = N(0,sige*V),
V = diag(v1,v2,...vn), r/vi = ID chi(r)/r, r = Gamma(m,k)
rho = N(c,T), sige = gamma(nu,d0)
----------------------------------------------------------------
USAGE: result = far_g(y,W,ndraw,nomit,prior,start)
where: y = nobs x 1 independent variable vector
W = nobs x nobs 1st-order contiguity matrix (standardized)
ndraw = # of draws
nomit = # of initial draws omitted for burn-in
prior = a structure variable for prior information input
prior.rho, prior mean for rho, c above, default = 0 (diffuse)
prior.rcov, prior rho variance, T above, default = 1e+12 (diffuse)
prior.nu, informative Gamma(nu,d0) prior on sige
prior.d0 default: nu=0,d0=0 (diffuse prior)
prior.rval, r prior hyperparameter, default=4
prior.m, informative Gamma(m,k) prior on r
prior.k, default: not used
prior.rmin, (optional) min value of rho to use in sampling
prior.rmax, (optional) max value of rho to use in sampling
start = (optional) (2x1) vector of rho, sige starting values
(defaults, rho = 0.5, sige = 1.0)
---------------------------------------------------------------
RETURNS: a structure:
results.meth = 'far_g'
results.pdraw = rho draws (ndraw-nomit x 1)
results.sdraw = sige draws (ndraw-nomit x 1)
results.vmean = mean of vi draws (1 x nobs)
results.yhat = predicted values of y
results.rdraw = r-value draws (ndraw-nomit x 1)
results.pmean = rho prior mean (if prior input)
results.pstd = rho prior std dev (if prior input)
results.nu = prior nu-value for sige (if prior input)
results.d0 = prior d0-value for sige (if prior input)
results.r = value of hyperparameter r (if input)
results.m = m prior parameter (if input)
results.k = k prior parameter (if input)
results.nobs = # of observations
results.ndraw = # of draws
results.nomit = # of initial draws omitted
results.y = actual observations
results.yhat = predicted values for y
results.time = time taken for sampling
results.accept = acceptance rate
results.pflag = 1 for prior, 0 for no prior
results.rmax = 1/max eigenvalue of W (or rmax if input)
results.rmin = 1/min eigenvalue of W (or rmin if input)
----------------------------------------------------------------
NOTE: use either improper prior.rval
or informative Gamma prior.m, prior.k, not both of them
----------------------------------------------------------------
As the documentation makes clear, there are a number of user options to
facilitate implementation of different models. Example 3.3 illustrates using the function with
various input options. We generate a FAR model vector y based on the
standardized W weight matrix from the Columbus neighborhood crime data set.
The program then produces maximum likelihood estimates for comparison to our
Gibbs sampled estimates. The first set of Gibbs estimates are produced with a
homoscedastic prior based on r=30 and diffuse priors for
and
.
Diffuse priors for
and
are the defaults used by far_g, and the default for r equals 4.
My experience indicates this represents a good rule-of-thumb value.
After producing the first estimates, we add two outliers to the data set at observations 10 and 39. We then compare maximum likelihood estimates to the Gibbs sampled estimates based on a heteroscedastic prior with r=4.
% ----- Example 3.3 Using the far_g function
load wmat.dat; % standardized 1st-order spatial weight matrix
W = wmat; % from the Columbus neighborhood data set
[n junk] = size(W); IN = eye(n);
rho = 0.75; % true value of rho
y = inv(IN-rho*W)*randn(n,1)*5; % generate data
ydev = y - mean(y);
vnames = strvcat('y-simulated','y-spatial lag');
rmin = 0; rmax = 1;
resml = far(ydev,W,rmin,rmax); % do maximum likelihood for comparison
prt(resml,vnames);
ndraw = 1100; nomit = 100;
prior.rval = 30; % homoscedastic prior diffuse rho,sigma (the default)
prior.rmin = 0; prior.rmax = 1;
result = far_g(ydev,W,ndraw,nomit,prior); % call Gibbs sampling function
prt(result,vnames);
% add outliers to the generated data
ydev(20,1) = ydev(20,1)*10; ydev(39,1) = ydev(39,1)*10;
prior.rval = 4; % heteroscedastic model, diffuse rho,sigma (the default)
resml2 = far(ydev,W); % do maximum likelihood for comparison
prt(resml2,vnames);
result2 = far_g(ydev,W,ndraw,nomit,prior); % call Gibbs sampling function
prt(result2,vnames);
% plot the mean of the vi-draws, which represent vi-estimates
plot(result2.vmean);
The program produced the following output. Note that our printing function computes means and standard deviations using the draws returned in the results structure of far_g. We also compute t-statistics and evaluate the marginal probabilities. This allows us to provide printed output in the form of a traditional regression model.
% homoscedastic models First-order spatial autoregressive model Estimates Dependent Variable = y-simulated R-squared = 0.5908 sigma^2 = 24.9531 Nobs, Nvars = 49, 1 log-likelihood = -285.86289 # of iterations = 9 min and max rho = 0.0000, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability rho 0.771980 6.319186 0.000000 Gibbs sampling First-order spatial autoregressive model Dependent Variable = y-simulated R-squared = 0.5805 sigma^2 = 25.2766 r-value = 30 Nobs, Nvars = 49, 1 ndraws,nomit = 1100, 100 acceptance rate = 0.8886 time in secs = 13.7984 min and max rho = 0.0000, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability rho 0.739867 8.211548 0.000000 % outlier models First-order spatial autoregressive model Estimates Dependent Variable = y-simulated R-squared = 0.0999 sigma^2 = 267.3453 Nobs, Nvars = 49, 1 log-likelihood = -398.06025 # of iterations = 14 min and max rho = -1.5362, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability rho 0.368404 1.591690 0.118019 Gibbs sampling First-order spatial autoregressive model Dependent Variable = y-simulated R-squared = 0.1190 sigma^2 = 107.8131 r-value = 3 Nobs, Nvars = 49, 1 ndraws,nomit = 1100, 100 acceptance rate = 0.8568 time in secs = 8.2693 min and max rho = 0.0000, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability rho 0.503992 3.190913 0.002501
The first two sets of output illustrate the point we made regarding Bayesian analysis implemented with a diffuse prior. The results from the Gibbs sampling approach are very close to those from maximum likelihood estimation. Note that the printed output shows the time required to carry out 1,100 draws along with the acceptance rate. It took only 13.7 seconds to produce 1,100 draws.
After introducing two outliers, we see that the maximum likelihood estimates
produce a poor fit to the data as well as an inflated estimate of
.
The coefficient estimate for
is also affected adversely, deviating from
the true value of 0.75, and the precision of the estimate is degraded. In
contrast, the Gibbs sampled estimate of
was closer to the true value
and exhibits greater precision as indicated by
the larger t-statistic. The estimate for
is much
smaller than that from maximum likelihood. Robust estimates will
generally exhibit a smaller R2 statistic as the estimates
place less weight on outliers rather than try to fit these data observations.
We also produced a plot of the mean of the vi draws which serve as an estimate of these relative variance terms. This graph is shown in Figure 3.4, where we see that the two outliers were identified.
Example 3.4 illustrates the use of the far_g function on the large Pace
and Barry data set. We set an r value of 4 which will capture heterogeneity
if it exists. We rely on the input options
to set a minimum and maximum value of
between
0 and 1 over which to search, to speed up computation.
This avoids computation of the eigenvalues for the large
matrix W which would provide the range over which to
search. If you find an estimate for
near zero, this
restriction to the (0,1) interval is unwise.
% ----- Example 3.4 Using far_g with a large data set
load elect.dat; % load data on votes in 3,107 counties
y = (elect(:,7)./elect(:,8)); % convert to per capita variables
ydev = y - mean(y);
clear elect; % conserve on RAM memory
load ford.dat; % 1st order contiguity matrix stored in sparse matrix form
ii = ford(:,1); jj = ford(:,2); ss = ford(:,3);
n = 3107;
clear ford; % clear ford matrix to save RAM memory
W = sparse(ii,jj,ss,n,n);
clear ii; clear jj; clear ss; % conserve on RAM memory
prior.rval = 4; prior.rmin = 0; prior.rmax = 1;
ndraw = 1100; nomit = 100;
res = far_g(ydev,W,ndraw,nomit,prior);
prt(res);
plot(res.vmean);
xlabel('Observations');
ylabel('V_i estimates');
pause;
pltdens(res.pdraw,0.1,0,1);
We present maximum likelihood results for comparison with the Gibbs sampling results. If there is no substantial heterogeneity in the disturbance, the two sets of estimates should be similar, as we saw from example 3.3.
% Maximum likelihood results First-order spatial autoregressive model Estimates R-squared = 0.5375 sigma^2 = 0.0054 Nobs, Nvars = 3107, 1 log-likelihood = 3506.3203 # of iterations = 13 min and max rho = -1.0710, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability rho 0.721474 59.567710 0.000000 % Gibbs sampling estimates Gibbs sampling First-order spatial autoregressive model R-squared = 0.5337 sigma^2 = 0.0052 r-value = 4 Nobs, Nvars = 3107, 1 ndraws,nomit = 1100, 100 acceptance rate = 0.7131 time in secs = 262.4728 min and max rho = 0.0000, 1.0000 *************************************************************** Variable Coefficient t-statistic t-probability rho 0.706526 47.180554 0.000000
From the results we see that the maximum likelihood and Bayesian robust estimates are very similar, suggesting a lack of heterogeneity. We can further explore this issue by examining a plot of the mean vi draws, which serve as estimates for these parameters in the model. Provided we use a small value of r, the presence of heterogeneity and outliers will be indicated by large vi estimates that deviate substantially from unity. Figure 3.5 shows a plot of the mean of the vi draws, confirming that a handful of large vi values exist. Close inspection reveals that only 58 vi values greater than 3 exist in a sample of 3107 observations. Apparently this amount of heterogeneity does not affect the estimates for this model.
An advantage of Gibbs sampling is that valid estimates of dispersion
are available for the
parameters as well as the entire posterior distribution associated with the
estimated parameters. Recall that this presents a problem for
large data sets estimated using maximum likelihood methods, which we solved
using a numerical hessian calculation. In the presence of outliers or
non-constant variance the numerical hessian approach may not be valid because
normality in the disturbance generating process might be violated. In the case
of Gibbs sampling, the law of large numbers suggests that we can compute valid
means and measures of dispersion from the sample of draws. As an illustration,
we use a function pltdens from the Econometrics Toolbox to produce a
non-parametric density estimate of the posterior distribution for
.
Figure 3.6 shows the posterior density that is plotted using the
command:
pltdens(res.pdraw,0.1,0,1);
In the figure we see what is known as a jittered plot showing the location of observations used to construct the density estimate. An optional argument to the function allows a kernel smoothing parameter to be input as indicated in the documentation for the function pltdens shown below. The default kernel bandwidth produces fairly non-smooth densities that tend to overfit the data, so we supply our own value.
PURPOSE: Draw a nonparametric density estimate.
---------------------------------------------------
USAGE: [h f y] = pltdens(x,h,p,kernel)
or pltdens(x) which uses gaussian kernel default
where:
x is a vector
h is the kernel bandwidth
default=1.06 * std(x) * n^(-1/5); Silverman page 45
p is 1 if the density is 0 for negative values
k is the kernel type:
=1 Gaussian (default)
=2 Epanechnikov
=3 Biweight
=4 Triangular
A jittered plot of the
observations is shown below the density.
---------------------------------------------------
RETURNS:
h = the interval used
f = the density
y = the domain of support
plot(y,f) will produce a plot of the density
--------------------------------------------------
The disadvantage of the Gibbs sampling estimation approach is the time required. This is reported in the printed output which indicates that it took 262 seconds to produce 1,100 draws. This is relatively competitive with the maximum likelihood estimation method that took around 100 seconds to produce estimates.
It should perhaps be clear that implementation of Gibbs samplers for the other spatial autoregressive models is quite straightforward. We need simply to determine the complete sequence of conditional distributions for the parameters in the model and code a loop to carry out the draws. LeSage (1997) sets forth an alternative approach to the Metropolis within Gibbs sampling. There are many ways to generate samples from an unknown conditional distribution and the ``ratio of uniforms'' approach set forth in LeSage (1997) is another approach. My experience has convinced me that the Metropolis approach set forth here is superior as it requires far less time.
All of the spatial autoregressive models will have in common the need to produce
a Metropolis-within Gibbs estimate for
based on a conditional
distribution involving the determinant
.
In the case of the
SAC model, we need two determinants, one for
and another for
.
Of course we will carry this out
initially over a grid of values and store the results. These will be passed to
the functions that perform the conditional distribution calculations.
There are functions sar_g, sem_g and sac_g that implement Gibbs sampling estimation for the Bayesian variants of the spatial autoregressive models. The documentation for sem_g (which is similar to that for the other models) is shown below:
PURPOSE: Gibbs sampling estimates of the heteroscedastic
spatial error model:
y = XB + u, u = lam*W + e
e is N(0,sige*V)
V = diag(v1,v2,...vn), r/vi = ID chi(r)/r, r = Gamma(m,k)
B = N(c,T), sige = gamma(nu,d0), lam = diffuse prior
---------------------------------------------------
USAGE: results = sem_g(y,x,W,ndraw,nomit,prior,start)
where: y = dependent variable vector (nobs x 1)
x = independent variables matrix (nobs x nvar)
W = 1st order contiguity matrix (standardized, row-sums = 1)
prior = a structure for: B = N(c,T), sige = gamma(nu,d0)
prior.beta, prior means for beta, c above (default 0)
prior.bcov, prior beta covariance , T above (default 1e+12)
prior.rval, r prior hyperparameter, default=4
prior.m, informative Gamma(m,k) prior on r
prior.k, (default: not used)
prior.nu, a prior parameter for sige
prior.d0, (default: diffuse prior for sige)
prior.lmin, (optional) min value of lambda to use in sampling
prior.lmax, (optional) max value of lambda to use in sampling
ndraw = # of draws
nomit = # of initial draws omitted for burn-in
start = (optional) structure containing starting values:
defaults: beta=ones(k,1),sige=1,rho=0.5, V=ones(n,1)
start.b = beta starting values (nvar x 1)
start.lam = lam starting value (scalar)
start.sig = sige starting value (scalar)
start.V = V starting values (n x 1)
---------------------------------------------------
RETURNS: a structure:
results.meth = 'sem_g'
results.bdraw = bhat draws (ndraw-nomit x nvar)
results.pdraw = lam draws (ndraw-nomit x 1)
results.sdraw = sige draws (ndraw-nomit x 1)
results.vmean = mean of vi draws (1 x nobs)
results.rdraw = r draws (ndraw-nomit x 1) (if m,k input)
results.bmean = b prior means, prior.beta from input
results.bstd = b prior std deviations sqrt(diag(prior.bcov))
results.r = value of hyperparameter r (if input)
results.nobs = # of observations
results.nvar = # of variables in x-matrix
results.ndraw = # of draws
results.nomit = # of initial draws omitted
results.y = actual observations (nobs x 1)
results.yhat = predicted values
results.nu = nu prior parameter
results.d0 = d0 prior parameter
results.time = time taken for sampling
results.accept= acceptance rate
results.lmax = 1/max eigenvalue of W (or lmax if input)
results.lmin = 1/min eigenvalue of W (or lmin if input)
As the other functions are quite similar, we leave it to the reader to examine
the documentation and demonstration files for these functions. One point to
note regarding use of these functions is that two options exist for specifying a
value for the hyperparameter r. The default is to rely on an improper prior
based on r=4. The other option allows a proper
(m,k) prior to be
assigned for r.
The first option has the virtue that convergence will be quicker and less draws
are required to produce estimates. The drawback is that the estimates are
conditional on the single value of r set for the hyperparameter. The second
approach produces draws from a
(m,k) distribution for r on each pass
through the sampler. This produces estimates that average over alternative r
values, in essence integrating over this parameter, resulting in unconditional
estimates.
My experience is that estimates produced with a
(8,2) prior having a mean
of r=4 and variance of 2 are quite similar to those based on an improper prior
with r=4. Use of the
prior tends to require a larger number of draws
based on convergence diagnostics routines implemented by the function coda
from the Econometrics Toolbox described in Chapter 5 of the manual.
We turn attention to some applications involving the use of these models. First
we present example 3.5 that generates SEM models for a set of
parameters ranging from 0.1 to 0.9, based on the spatial weight matrix from the
Columbus neighborhood crime data set. Both maximum likelihood and Gibbs
estimates are produced by the program and a table is printed out to compare the
estimation results. The hyperparameter r was set to 30 in this example,
which should produce estimates similar to the maximum likelihood results.
During the loop over alternative data sets, we recover the estimates and other information we are interested in from the results structures. These are stored in a matrix that we print using the mprint function to add column and row labels.
% ----- Example 3.5 Using the sem_g function
load wmat.dat; % standardized 1st-order contiguity matrix
load anselin.dat; % load Anselin (1988) Columbus neighborhood crime data
y = anselin(:,1); n = length(y);
x = [ones(n,1) anselin(:,2:3)];
W = wmat; IN = eye(n);
vnames = strvcat('crime','const','income','house value');
tt = ones(n,1); tt(25:n,1) = [1:25]';
rvec = 0.1:.1:.9; b = ones(3,1);
nr = length(rvec); results = zeros(nr,6);
ndraw = 1100; nomit = 100; prior.rval = 30; bsave = zeros(nr,6);
for i=1:nr, rho = rvec(i);
u = (inv(IN-rho*W))*randn(n,1);
y = x*b + u;
% do maximum likelihood for comparison
resml = sem(y,x,W); prt(resml);
results(i,1) = resml.lam;
results(i,2) = resml.tstat(4,1);
bsave(i,1:3) = resml.beta';
% call Gibbs sampling function
result = sem_g(y,x,W,ndraw,nomit,prior); prt(result);
results(i,3) = mean(result.pdraw);
results(i,4) = results(i,3)/std(result.pdraw);
results(i,5) = result.time;
results(i,6) = result.accept;
bsave(i,4:6) = mean(result.bdraw);
end;
in.rnames = strvcat('True lam','0.1','0.2','0.3', ...
'0.4','0.5','0.6','0.7','0.8','0.9');
in.cnames = strvcat('ML lam','lam t','Gibbs lam','lam t', ...
'time','accept');
mprint(results,in);
in2.cnames = strvcat('b1 ML','b2 ML','b3 ML',...
'b1 Gibbs','b2 Gibbs','b3 Gibbs');
mprint(bsave,in2);
From the results we see that 1100 draws took around 13 seconds. The
acceptance rate falls slightly for values of
near 0.9, which
we would expect. Since this is close to the upper limit of unity, we
will see an increase in rejections of candidate values for
that lie outside the feasible range. The estimates are reasonably
similar to the maximum likelihood results -- even for the relatively
small number of draws used. In addition to presenting estimates for
,
we also provide estimates for the parameters
in the
problem.
% sem model demonstration
True lam ML lam lam t Gibbs lam lam t time accept
0.1 -0.6357 -3.1334 -0.5049 -2.5119 12.9968 0.4475
0.2 0.0598 0.3057 0.0881 0.4296 12.8967 0.4898
0.3 0.3195 1.8975 0.3108 1.9403 13.0212 0.4855
0.4 0.2691 1.5397 0.2091 1.1509 12.9347 0.4827
0.5 0.5399 4.0460 0.5141 3.6204 13.1345 0.4770
0.6 0.7914 10.2477 0.7466 7.4634 13.3044 0.4616
0.7 0.5471 4.1417 0.5303 3.8507 13.2014 0.4827
0.8 0.7457 8.3707 0.7093 6.7513 13.5251 0.4609
0.9 0.8829 17.5062 0.8539 14.6300 13.7529 0.4349
b1 ML b2 ML b3 ML b1 Gibbs b2 Gibbs b3 Gibbs
1.0570 1.0085 0.9988 1.0645 1.0112 0.9980
1.2913 1.0100 1.0010 1.2684 1.0121 1.0005
0.7910 1.0298 0.9948 0.7876 1.0310 0.9936
1.5863 0.9343 1.0141 1.5941 0.9283 1.0157
1.2081 0.9966 1.0065 1.2250 0.9980 1.0058
0.3893 1.0005 1.0155 0.3529 1.0005 1.0171
1.2487 0.9584 1.0021 1.3191 0.9544 1.0029
1.8094 1.0319 0.9920 1.8366 1.0348 0.9918
-0.9454 1.0164 0.9925 -0.9783 1.0158 0.9923
As another example, we use a generated data set into which we insert 2 outliers. The program generates a vector y based on the Columbus neighborhood crime data set and then adjusts two of the generated y values for observations 10 and 40 to create outliers.
Example 3.6 produces maximum likelihood and two sets of Bayesian SAR model estimates. One Bayesian model uses a homoscedastic prior and the other sets r=4, creating a a heteroscedastic prior. This is to illustrate that the differences in the parameter estimates are due to their robust nature, not the Gibbs sampling approach to estimation. A point to consider is that maximum likelihood estimates of precision based on the information matrix rely on normality, which is violated by the existence of outliers. These create a disturbance distribution that contains `fatter tails' than the normal distribution, not unlike the t-distribution. In fact, this is the motivation for Geweke's approach to robustifying against outliers. Gibbs estimates based on a heteroscedastic prior don't rely on normality. If you find a difference between the estimates of precision from maximum likelihood estimates and Bayesian Gibbs estimates, it is a good indication that outliers may exist.
% ----- Example 3.6 An outlier example
load anselin.dat; load wmat.dat;
load anselin.dat; % load Anselin (1988) Columbus neighborhood crime data
x = [anselin(:,2:3)]; [n k] = size(x); x = [ones(n,1) x];
W = wmat; IN = eye(n);
rho = 0.5; % true value of rho
b = ones(k+1,1); % true value of beta
Winv = inv(IN-rho*W);
y = Winv*x*b + Winv*randn(n,1);
vnames = strvcat('y-simulated','constant','income','house value');
% insert outliers
y(10,1) = y(10,1)*2; y(40,1) = y(40,1)*2;
% do maximum likelihood for comparison
resml = sar(y,x,W); prt(resml,vnames);
ndraw = 1100; nomit = 100;
prior.rval = 100; % homoscedastic model,
resg = sar_g(y,x,W,ndraw,nomit,prior);
prt(resg,vnames);
prior.rval = 4; % heteroscedastic model,
resg2 = sar_g(y,x,W,ndraw,nomit,prior);
prt(resg2,vnames);
% plot the vi-estimates
plot(resg2.vmean);
xlabel('Observations');
ylabel('mean of V_i draws');
The maximum likelihood SAR estimates along with the two sets of Bayesian model
estimates are shown below. We see that the homoscedastic Gibbs estimates are
similar to the maximum likelihood estimates, demonstrating that the Gibbs
sampling estimation procedure is not responsible for the difference in estimates
we see between maximum likelihood and the heteroscedastic Bayesian model. For
the case of the heteroscedastic prior we see much better estimates for both
and
.
Note that the R-squared statistic is lower
for the robust estimates which will be the case because robustification
requires that we not attempt to `fit' the outlying observations.
This will generally lead to a worse fit for models that produce
robust estimates.
Spatial autoregressive Model Estimates
Dependent Variable = y-simulated
R-squared = 0.6779
Rbar-squared = 0.6639
sigma^2 = 680.8004
Nobs, Nvars = 49, 3
log-likelihood = -212.59468
# of iterations = 14
min and max rho = -1.5362, 1.0000
***************************************************************
Variable Coefficient t-statistic t-probability
constant 19.062669 1.219474 0.228880
income -0.279572 -0.364364 0.717256
house value 1.966962 8.214406 0.000000
rho 0.200210 1.595514 0.117446
Gibbs sampling spatial autoregressive model
Dependent Variable = y-simulated
R-squared = 0.6770
sigma^2 = 1077.9188
r-value = 100
Nobs, Nvars = 49, 3
ndraws,nomit = 1100, 100
acceptance rate = 0.9982
time in secs = 28.9612
min and max rho = -1.5362, 1.0000
***************************************************************
Variable Prior Mean Std Deviation
constant 0.000000 1000000.000000
income 0.000000 1000000.000000
house value 0.000000 1000000.000000
***************************************************************
Posterior Estimates
Variable Coefficient t-statistic t-probability
constant 18.496993 1.079346 0.286060
income -0.123720 -0.142982 0.886929
house value 1.853332 10.940066 0.000000
rho 0.219656 1.589213 0.118863
Gibbs sampling spatial autoregressive model
Dependent Variable = y-simulated
R-squared = 0.6050
sigma^2 = 1374.8421
r-value = 3
Nobs, Nvars = 49, 3
ndraws,nomit = 1100, 100
acceptance rate = 0.9735
time in secs = 17.2292
min and max rho = -1.5362, 1.0000
***************************************************************
Variable Prior Mean Std Deviation
constant 0.000000 1000000.000000
income 0.000000 1000000.000000
house value 0.000000 1000000.000000
***************************************************************
Posterior Estimates
Variable Coefficient t-statistic t-probability
constant 13.673728 0.778067 0.440513
income 0.988163 0.846761 0.401512
house value 1.133790 3.976679 0.000245
rho 0.388923 2.844965 0.006610
One point that may be of practical importance is that using large values for the hyperparameter r slows down the Gibbs sampling process. This is because the chi-squared random draws take longer for large r values. This shows up in this example where the 1100 draws for the homoscedastic prior based on r=100 took close to 29 seconds and those for the model based on r=4 took only 17.2 seconds. This suggests that a good operational strategy would be not to rely on values of r greater than 30 or 40. These values may produce vi estimates that deviate from unity somewhat, but should in most cases replicate the maximum likelihood estimates when there are no outliers.
A better strategy is to always rely on a small r value between 2 and 8, in which case a divergence between the maximum likelihood estimates and those from the Bayesian model reflect the existence of non-constant variance or outliers.
We applied the series of spatial autoregressive models from Section 2.5 as well as corresponding Bayesian spatial autoregressive models to the Boston data set. Recall that Belsley, Kuh and Welsch (1980) used this data set to illustrate the impact of outliers and influential observations on least-squares estimation results. Here we have an opportunity to see the how the Bayesian spatial autoregressive models deal with the outliers.
Example 3.7 shows the program code needed to implement both maximum likelihood and Bayesian models for this data set.
% ----- Example 3.7 Robust Boston model estimation
load boston.raw; % Harrison-Rubinfeld data
load latitude.data; load longitude.data;
[W1 W W3] = xy2cont(latitude,longitude); % create W-matrix
[n k] = size(boston);y = boston(:,k); % median house values
x = boston(:,1:k-1); % other variables
vnames = strvcat('hprice','crime','zoning','industry','charlesr', ...
'noxsq','rooms2','houseage','distance','access','taxrate', ...
'pupil/teacher','blackpop','lowclass');
ys = studentize(log(y)); xs = studentize(x);
rmin = 0; rmax = 1;
tic; res1 = sar(ys,xs,W,rmin,rmax); prt(res1,vnames); toc;
prior.rmin = 0; prior.rmax = 1;
prior.rval = 4;
ndraw = 1100; nomit=100;
tic; resg1 = sar_g(ys,xs,W,ndraw,nomit,prior);
prt(resg1,vnames); toc;
tic; res2 = sem(ys,xs,W,rmin,rmax); prt(res2,vnames); toc;
tic; resg2 = sem_g(ys,xs,W,ndraw,nomit,prior);
prt(resg2,vnames); toc;
tic; res3 = sac(ys,xs,W,W); prt(res3,vnames); toc;
tic; resg3 = sac_g(ys,xs,W,W,ndraw,nomit,prior);
prt(resg3,vnames); toc;
An interesting aspect is the timing results which were produced using the MATLAB `tic' and `toc' commands. Maximum likelihood estimation of the SAR model took 44 seconds while Gibbs sampling using 1100 draws and omitting the first 100 took 124 seconds. For the SEM model the corresponding times were 59 and 164 seconds and for the SAC model 114 and 265 seconds. These times seem quite reasonable for this moderately sized problem.
The results are shown below. (We eliminated the printed output showing the
prior means and standard deviations because all of the Bayesian models were
implemented with diffuse priors for the parameters
in the model.) A
prior value of r=4 was used to produce robustification against outliers and
non-constant variance.
What do we learn from this exercise? First, the parameters
and
for the SAR and SEM models from maximum likelihood and Bayesian models are in
agreement regarding both their magnitude and statistical significance. For the SAC model, the
Bayesian estimate for
is in agreement with the maximum likelihood
estimate, but that for
is not. The Bayesian estimate is 0.107 versus
0.188 for maximum likelihood. The maximum likelihood estimate is significant
whereas the Bayesian estimate is not. This would impact our
decision regarding which model represents the best specification.
Most of the
estimates are remarkably similar with one notable exception,
that for the `noxsq' pollution variable. In all three Bayesian models this
estimate is smaller than the maximum likelihood estimate and insignificant at
the 95% level. All maximum likelihood estimates indicate significance. This
would represent an important policy difference in the inference made regarding
the impact of air pollution on housing values.
Spatial autoregressive Model Estimates (elapsed_time = 44.2218)
Dependent Variable = hprice
R-squared = 0.8421
Rbar-squared = 0.8383
sigma^2 = 0.1576
Nobs, Nvars = 506, 13
log-likelihood = -85.099051
# of iterations = 9
min and max rho = 0.0000, 1.0000
***************************************************************
Variable Coefficient t-statistic t-probability
crime -0.165349 -6.888522 0.000000
zoning 0.080662 3.009110 0.002754
industry 0.044302 1.255260 0.209979
charlesr 0.017156 0.918665 0.358720
noxsq -0.129635 -3.433659 0.000646
rooms2 0.160858 6.547560 0.000000
houseage 0.018530 0.595675 0.551666
distance -0.215249 -6.103520 0.000000
access 0.272237 5.625288 0.000000
taxrate -0.221229 -4.165999 0.000037
pupil/teacher -0.102405 -4.088484 0.000051
blackpop 0.077511 3.772044 0.000182
lowclass -0.337633 -10.149809 0.000000
rho 0.450871 12.348363 0.000000
Gibbs sampling spatial autoregressive model (elapsed_time = 126.4168)
Dependent Variable = hprice
R-squared = 0.8338
sigma^2 = 0.1812
r-value = 4
Nobs, Nvars = 506, 13
ndraws,nomit = 1100, 100
acceptance rate = 0.9910
time in secs = 110.2646
min and max rho = 0.0000, 1.0000
***************************************************************
Posterior Estimates
Variable Coefficient t-statistic t-probability
crime -0.127092 -3.308035 0.001008
zoning 0.057234 1.467007 0.143012
industry 0.045240 0.950932 0.342105
charlesr 0.006076 0.249110 0.803379
noxsq -0.071410 -1.512866 0.130954
rooms2 0.257551 5.794703 0.000000
houseage -0.031992 -0.748441 0.454551
distance -0.171671 -5.806800 0.000000
access 0.173901 2.600740 0.009582
taxrate -0.202977 -5.364128 0.000000
pupil/teacher -0.086710 -3.081886 0.002172
blackpop 0.094987 3.658802 0.000281
lowclass -0.257394 -5.944543 0.000000
rho 0.479126 5.697191 0.000000
Spatial error Model Estimates (elapsed_time = 59.0996)
Dependent Variable = hprice
R-squared = 0.8708
Rbar-squared = 0.8676
sigma^2 = 0.1290
log-likelihood = -58.604971
Nobs, Nvars = 506, 13
# iterations = 10
min and max lam = 0.0000, 1.0000
***************************************************************
Variable Coefficient t-statistic t-probability
crime -0.186710 -8.439402 0.000000
zoning 0.056418 1.820113 0.069348
industry -0.000172 -0.003579 0.997146
charlesr -0.014515 -0.678562 0.497734
noxsq -0.220228 -3.683553 0.000255
rooms2 0.198585 8.325187 0.000000
houseage -0.065056 -1.744224 0.081743
distance -0.224595 -3.421361 0.000675
access 0.352244 5.448380 0.000000
taxrate -0.257567 -4.527055 0.000008
pupil/teacher -0.122363 -3.839952 0.000139
blackpop 0.129036 4.802657 0.000002
lowclass -0.380295 -10.625978 0.000000
lambda 0.757669 19.133467 0.000000
Gibbs sampling spatial error model (elapsed_time = 164.8779)
Dependent Variable = hprice
R-squared = 0.7313
sigma^2 = 0.1442
r-value = 4
Nobs, Nvars = 506, 13
ndraws,nomit = 1100, 100
acceptance rate = 0.4715
time in secs = 116.2418
min and max lambda = -1.9826, 1.0000
***************************************************************
Posterior Estimates
Variable Coefficient t-statistic t-probability
crime -0.165360 -3.967705 0.000083
zoning 0.048894 1.226830 0.220472
industry -0.002985 -0.051465 0.958976
charlesr -0.014862 -0.538184 0.590693
noxsq -0.145616 -1.879196 0.060807
rooms2 0.339991 7.962844 0.000000
houseage -0.130692 -2.765320 0.005900
distance -0.175513 -2.398220 0.016846
access 0.276588 3.121642 0.001904
taxrate -0.234511 -4.791976 0.000002
pupil/teacher -0.085891 -2.899236 0.003908
blackpop 0.144119 4.773623 0.000002
lowclass -0.241751 -5.931583 0.000000
lambda 0.788149 19.750640 0.000000
General Spatial Model Estimates (elapsed_time = 114.5267)
Dependent Variable = hprice
R-squared = 0.8662
Rbar-squared = 0.8630
sigma^2 = 0.1335
log-likelihood = -55.200525
Nobs, Nvars = 506, 13
# iterations = 7
***************************************************************
Variable Coefficient t-statistic t-probability
crime -0.198184 -8.766862 0.000000
zoning 0.086579 2.824768 0.004923
industry 0.026961 0.585884 0.558222
charlesr -0.004154 -0.194727 0.845687
noxsq -0.184557 -3.322769 0.000958
rooms2 0.208631 8.573808 0.000000
houseage -0.049980 -1.337513 0.181672
distance -0.283474 -5.147088 0.000000
access 0.335479 5.502331 0.000000
taxrate -0.257478 -4.533481 0.000007
pupil/teacher -0.120775 -3.974717 0.000081
blackpop 0.126116 4.768082 0.000002
lowclass -0.374514 -10.707764 0.000000
rho 0.625963 9.519920 0.000000
lambda 0.188257 3.059010 0.002342
Gibbs sampling general spatial model (elapsed_time = 270.8657)
Dependent Variable = hprice
R-squared = 0.7836
sigma^2 = 0.1487
r-value = 4
Nobs, Nvars = 506, 13
ndraws,nomit = 1100, 100
accept rho rate = 0.8054
accept lam rate = 0.9985
time in secs = 205.6773
min and max rho = 0.0000, 1.0000
min and max lambda = -1.9826, 1.0000
***************************************************************
Posterior Estimates
Variable Coefficient t-statistic t-probability
crime -0.161673 -3.905439 0.000107
zoning 0.047727 1.274708 0.203013
industry 0.024687 0.433283 0.664999
charlesr 0.008255 0.319987 0.749114
noxsq -0.121782 -1.814226 0.070251
rooms2 0.333823 7.332684 0.000000
houseage -0.098357 -2.080260 0.038018
distance -0.193060 -3.798247 0.000164
access 0.227007 2.592018 0.009825
taxrate -0.231393 -4.713901 0.000003
pupil/teacher -0.110537 -3.852098 0.000133
blackpop 0.137065 4.555835 0.000007
lowclass -0.293952 -7.201166 0.000000
rho 0.689346 7.938580 0.000000
lambda 0.107479 1.188044 0.235388
One other variable where we would draw a different inference is the `houseage' variable in both the SEM and SAC models. The Bayesian estimates indicate significance for this variable whereas maximum likelihood estimates do not.
Another interesting difference between the Bayesian models and maximum likelihood is the lower R2 statistic for the Bayesian versus corresponding maximum likelihood estimates. The Bayesian SEM and SAC models both show a dramatic reduction in fit indicating the robust nature of these estimates. The Bayesian SAR model shows only a modest reduction in fit when compared to the maximum likelihood estimates. Recall that we rejected the SAR model in Chapter 2 for a number of reasons.
How do we decide between the maximum likelihood and Bayesian estimates? One issue we should explore is that of outliers and non-constant variance. A plot of the vi estimates based on the mean of the draws from the Bayesian SAC model is shown in Figure 3.7. All three Bayesian models produced very similar estimates for the vi terms as shown for the SAC model in Figure 3.7.
Given these estimates for the vi terms, it would be hard to maintain the hypothesis of a constant variance normally distributed disturbance process for this model and data set.
Choosing between the Bayesian SEM and SAC models is not necessary as a similar set of inferences regarding the significance of the `noxsq' air pollution variable and `houseage' are produced by both models. Note that these are different inferences than one would draw from maximum likelihood estimates for either the SEM or SAC models.
If we wished to pursue this point we might examine the posterior
distribution of
from the SAC model, which is shown
in Figure 3.8.
Suppose we felt comfortable imposing the restriction that
must be
positive? We can do this with the Bayesian SAC model, whereas this is not
possible (given our optimization procedures) for the maximum likelihood
estimation procedure. This can be accomplished using an input option to the
sac_g function as shown in example 3.8. Note that we
increased the number of draws to 2100 in example 3.8 to increase
the precision regarding this model's estimates.
% ----- Example 3.8 Imposing restrictions
load boston.raw; % Harrison-Rubinfeld data
load latitude.data; load longitude.data;
[W1 W W3] = xy2cont(latitude,longitude); % create W-matrix
[n k] = size(boston);y = boston(:,k); % median house values
x = boston(:,1:k-1); % other variables
vnames = strvcat('hprice','crime','zoning','industry','charlesr', ...
'noxsq','rooms2','houseage','distance','access','taxrate', ...
'pupil/teacher','blackpop','lowclass');
ys = studentize(log(y)); xs = studentize(x);
prior.rmin = 0; prior.rmax = 1;
prior.lmin = 0; prior.lmax = 1;
prior.rval = 4; ndraw = 2100; nomit=100;
resg3 = sac_g(ys,xs,W,W,ndraw,nomit,prior);
prt(resg3,vnames);
Carrying out estimation of this version of the model, we found the following
results. The estimate for the parameter
is now perhaps different from
zero as indicated by a t-statistic that is significant at the 0.10 level.
Gibbs sampling general spatial model
Dependent Variable = hprice
R-squared = 0.7891
sigma^2 = 0.1491
r-value = 4
Nobs, Nvars = 506, 13
ndraws,nomit = 2100, 100
accept rho rate = 0.9464
accept lam rate = 0.6243
time in secs = 403.2714
min and max rho = -1.9826, 1.0000
min and max lambda = 0.0000, 1.0000
***************************************************************
Posterior Estimates
Variable Coefficient t-statistic t-probability
crime -0.164682 -4.096886 0.000049
zoning 0.048255 1.252200 0.211091
industry 0.018722 0.334302 0.738294
charlesr 0.009160 0.360944 0.718296
noxsq -0.123344 -1.915048 0.056065
rooms2 0.328860 7.352177 0.000000
houseage -0.096837 -2.045083 0.041377
distance -0.192698 -3.829066 0.000145
access 0.232155 2.771251 0.005795
taxrate -0.233451 -5.075241 0.000001
pupil/teacher -0.109920 -3.730565 0.000213
blackpop 0.136174 4.462714 0.000010
lowclass -0.297580 -7.136230 0.000000
rho 0.671345 7.891604 0.000000
lambda 0.134279 1.700944 0.089584
A graphical examination of the posterior density for this parameter shown in
Figure 3.9 might make us a bit concerned about the nature of the
restriction we imposed on
.
A virtue of Gibbs sampling is that we can compute the probability of
by simply counting the Gibbs draws for this parameter that are less than zero
in the unrestricted model. This can be done easily as shown in example 3.9. To
improve the accuracy of our calculation, we increased the number of draws to
2100 in this program.
% ----- Example 3.9 The probability of negative lambda
load boston.raw; % Harrison-Rubinfeld data
load latitude.data; load longitude.data;
[W1 W W3] = xy2cont(latitude,longitude); % create W-matrix
[n k] = size(boston);y = boston(:,k); % median house values
x = boston(:,1:k-1); % other variables
vnames = strvcat('hprice','crime','zoning','industry','charlesr', ...
'noxsq','rooms2','houseage','distance','access','taxrate', ...
'pupil/teacher','blackpop','lowclass');
ys = studentize(log(y)); xs = studentize(x);
prior.rval = 4; ndraw = 2100; nomit=100;
resg3 = sac_g(ys,xs,W,W,ndraw,nomit,prior);
prt(resg3,vnames);
% find the number of lambda draws < 0
nlam = find(resg3.ldraw < 0); numl = length(nlam);
fprintf(1,'The # of negative lambda values is: %5d \n',length(nlam));
fprintf(1,'Probability lambda < 0 is: %6.2f \n',numl/(ndraw-nomit));
The results indicate a probability of 6%, which should
not make us nervous about imposing the restriction on
.
The # of negative lambda values is: 126 Probability lambda < 0 is: 0.06
Summarizing, I would use the Gibbs SAC model based on the restriction that
.
With the exception of
,
this
would produce the same inferences regarding all parameters
as the model without this restriction. It would also produce
the same inferences regarding the
parameters as the SEM model, which is
comforting.
We have seen that spatial autoregressive models can be extended to allow for Bayesian prior information as well as non-constant variances over space. These models require a Gibbs sampling estimation approach, making them take more time than the maximum likelihood estimation methods. The time is quite reasonable, even for large sample problems because we rely on the MATLAB sparse matrix algorithms and a grid-based approach to compute determinants that we use in the sampling process. In fact, as the sample size gets larger, the time difference between maximum likelihood and Gibbs sampling methods diminishes.
An advantage of these models is that they can serve as a check on the assumption of homogeneity that is inherent in the maximum likelihood models. It should be noted that some work has been done on accommodating non-constant variance in the case of maximum likelihood methods (see Anselin, 1988). Unfortunately, these approaches require that the investigator add a specification for the changing variance over space. This adds to the specification problems facing the practitioner, whereas the Bayesian approach set forth here requires no such specification. Outliers and non-constant variance are automatically detected during estimation and the estimates are adjusted for these problems.
We saw in an application from Section 3.5 an example where the
existence of outliers produced different inferences from maximum likelihood and
Bayesian robust estimates. These differences would have important policy
implications for the conclusions one were to draw regarding the impact of air
quality on housing values. These Bayesian models would produce different
inferences than maximum likelihood regarding two of the explanatory variables
`noxsq' and `houseage' and for the SAC model there is a difference
regarding the significance of the parameter
.
This chapter discusses in detail a set of estimation methods that attempt to accommodate spatial heterogeneity by allowing the parameters of the model to vary with the spatial location of the sample data. The first section deals with spatial and distance expansion models introduced by Casetti (1972, 1992). A more recent variant of this model presented in Casetti (1982) and Casetti and Can (1998) called a DARP model is the subject of Section 4.2.
Non-parametric locally linear regression models introduced in McMillen (1996), McMillen and McDonald (1997) and Brunsdon, Fotheringham and Charlton (1997) (sometimes labeled geographically-weighted regression) represent another way to deal with spatial heterogeneity. These models are covered in Section 4.3. Finally, a Bayesian approach to geographically-weighted regressions is presented in Section 4.4.
The first model of this type was introduced by Casetti (1972) and labeled a
spatial expansion model. The model is shown in (4.1), where y denotes
an nx1 dependent variable vector associated with spatial observations and X
is an nxnk matrix consisting of terms xi representing kx1 explanatory
variable vectors, as shown in (4.2). The locational information is
recorded in the matrix Z which has elements
,
that represent latitude and longitude coordinates of each observation as shown
in (4.2).
The model posits that the parameters vary as a function of the latitude and
longitude coordinates. The only parameters that need be estimated are the
parameters in
that we denote,
.
These
represent a set of 2k parameters. Recall our discussion about spatial
heterogeneity and the need to utilize a parsimonious specification for variation
over space. This represents one approach to this type of specification.
We note that the parameter vector
in (4.1) represents an nkx1
vector in this model that contains parameter estimates for all k explanatory
variables at every observation. The parameter vector
contains the
2k parameters to be estimated.
Where:
This model can be estimated using least-squares to produce estimates of the 2k
parameters
.
Given these estimates, the remaining
estimates for individual points in space can be derived using the second
equation in (4.1). This process is referred to as the ``expansion
process''. To see this, substitute the second equation in (4.1) into the
first, producing:
Here it is clear that X, Z and J represent available information or data
observations and only
represent parameters in the model that need be
estimated.
The model would capture spatial heterogeneity by allowing variation in the underlying relationship such that clusters of nearby or neighboring observations measured by latitude-longitude coordinates take on similar parameter values. As the location varies, the regression relationship changes to accommodate a locally linear fit through clusters of observations in close proximity to one another.
Another way to implement this model is to rely on a vector of distances rather than the latitude-longitude coordinates. This implementation defines the distance from a central observation,
| (4.4) |
Where
Zxc,Zyc denote the latitude-longitude coordinates of the
centrally located observation and
Zxi,Zyi denote the latitude-longitude
coordinates for observations
in the data sample.
This approach allows one to ascribe different weights to observations based on their distance from the central place origin. The formulation discussed above would result in a distance vector that increased with distance from the central observation. This would be suitable if one were modeling a phenomenon reflecting a ``hollowing out'' of the central city or a decay of influence with distance from the central point.
The distance expansion model can be written as:
Where
represents the distance of
each observation from the central place and
represents a kx1 vector
of parameters for the central place. The matrix J in (4.5) is an
nxk matrix,
.
Estimating this model is relatively straightforward as we can rely on least-squares. One issue is that there are a number of alternative expansion specifications. For example, one approach would be to construct a model that includes the base k explanatory variables in the matrix X estimated with fixed parameters, plus an additional 2k expansion variables based on the latitude-longitude expansion. Another approach would be to include the base k variables in the matrix X and only 2(k-1) variables in expansion form by excluding the constant term from the expansion process. Yet another approach would be to rely on a simple expansion of all variables as was illustrated in (4.1).
The second approach was taken in implementing the MATLAB function casetti that carries out spatial expansion estimation. This choice was made because it seems unwise to include the constant term in the expansion as one can overfit the sample data when the intercept is allowed to very over space. A motivation for not relying on a simple expansion of all variables is that we would like our model to partition the influence of explanatory variables into fixed plus spatial effects. A simple expansion assigns all influence to spatial effects and also falls prey to the overfitting problem by allowing the intercept term to vary.
The expansion implemented by our function casetti can be written as:
| (4.6) |
The function allows the user to specify an option for distance expansion based on a particular point in the spatial data sample or the latitude-longitude expansion. In the case of the distance expansion, the k explanatory variables in the matrix X are used as non-expansion variables estimated with fixed parameters and the k-1 variables excluding the constant are included as distance-expanded variables. This version of the model can be written as:
| (4.7) |
For the case of distance expansion, a distance vector is calculated as:
,
where
Zxc,Zyc
denote the latitude-longitude coordinates of the centrally located observation
and
Zxi,Zyi denote the coordinates for observation i in the data
sample. The distance of the central point is zero of course.
An optional input is provided to carry out isotropic normalization of the x-y coordinates which essentially puts the coordinates in deviations from the means form and then standardizes by dividing by the square root of the sum of the variances in the x-y directions. That is:
This normalization is carried out by a function normxy in the spatial econometrics library.
This normalization should make the center points xc,yc close to zero and produces a situation where the coefficients for the ``base model'' represent a central observation. The distance-expanded estimates provide information about variation in the model parameters with reference to the central point.
The documentation for casetti is:
PURPOSE: computes Casetti's spatial expansion regression
---------------------------------------------------
USAGE: results = casetti(y,x,xc,yc,option)
where: y = dependent variable vector
x = independent variables matrix
xc = latitude (or longitude) coordinate
yc = longitude (or latitude) coordinate
option = a structure variable containing options
option.exp = 0 for x-y expansion (default)
= 1 for distance from ctr expansion
option.ctr = central point observation # for distance expansion
option.norm = 1 for isotropic x-y normalization (default=0)
---------------------------------------------------
RETURNS:
results.meth = 'casetti'
results.b0 = bhat (underlying b0x, b0y)
results.t0 = t-stats (associated with b0x, b0y)
results.beta = spatially expanded estimates (nobs x nvar)
results.yhat = yhat
results.resid = residuals
results.sige = e'*e/(n-k)
results.rsqr = rsquared
results.rbar = rbar-squared
results.nobs = nobs
results.nvar = # of variables in x
results.y = y data vector
results.xc = xc
results.yc = yc
results.ctr = ctr (if input)
results.dist = distance vector (if ctr used)
results.exp = exp input option
results.norm = norm input option
--------------------------------------------------
NOTE: assumes x(:,1) contains a constant term
--------------------------------------------------
Given the exclusion of the constant term from the spatial expansion formulation, we need to impose that the user place the constant term vector in the first column of the explanatory variables matrix X used as an input argument to the function.
Of course, we have an associated function to print the results structure and another to provide a graphical presentation of the estimation results. Printing these estimation results is a bit challenging because of the large number of parameter estimates that we produce using this method. Graphical presentation may provide a clearer picture of the variation in coefficients over space. A call to plt using the `results' structure variable will produce plots of the coefficients in both the x- and y-directions for the case of the latitude-longitude expansion, where we sort the x-direction from left to right and the y-direction from left to right. This provides a visual picture of how the coefficients vary over space. If the x-coordinates are largest for the east and smallest for the west, the plot will show coefficient variation from west to east as in map space. Similarly, if the y-coordinates are smallest for the south and largest in the north, the plot will present coefficient variation from south to north. (Note that if you enter Western hemisphere latitude-longitude coordinates, the x-direction plots will be from east to west, but the y-direction plots will be south to north.)
For the case of distance expansion estimates, the plots present coefficients sorted by distance from the central point, provided by the user in the structure field `option.ctr'. The central observation (smallest distance) will be on the left of the graph and the largest distance on the right.
Another point to note regarding the graphical presentation of the estimates relates to the fact that we present the coefficients in terms of the individual variables' total impact on the dependent variable y. It was felt that users would usually be concerned with the total impact of a particular variable on the dependent variable as well as the decomposition of impacts into spatial and non-spatial effects. The printed output provides the coefficient estimates for the base model as well as the expansion coefficients that can be used to analyze the marginal effects from the spatial and non-spatial decomposition. To provide another view of the impact of the explanatory variables in the model on the dependent variable, the graphical presentation plots the coefficient estimates in a form representing their total impact on the dependent variable. That is we graph:
Where
are plotted for the x-y expansion
and
is graphed for the distance expansion.
This should provide a feel for the total impact of variable
i on the dependent variable since it takes into account
the non-spatial impact attributed to
,
as well
as the spatially varying impacts in the x-y direction
or with respect to distance.
An illustration in the next section will pursue
this point in more detail.
Example 4.1 illustrates use of the function casetti based on the Columbus neighborhood crime data set. Both types of expansion models are estimated by changing the structure variable `option' field `.exp'. For the case of distance expansion, we rely on a central observation number 20 which lies near the center of the spatial sample of neighborhoods. One point to note is that the x-coordinate in Anselin's data set represents the south-north direction and the y-coordinate reflects the west-east direction.
% ----- example 4.1 Using the casetti() function
% load Anselin (1988) Columbus neighborhood crime data
load anselin.dat; y = anselin(:,1); n = length(y);
x = [ones(n,1) anselin(:,2:3)];
% Anselin (1988) x-y coordinates
xc0 = anselin(:,4); yc0 = anselin(:,5);
vnames = strvcat('crime','const','income','hse value');
% do Casetti regression using x-y expansion (default)
res1 = casetti(y,x,xc0,yc0);
prt(res1,vnames); % print the output
plt(res1,vnames); % graph the output
pause;
% do Casetti regression using distance expansion
option.exp = 1; option.ctr = 20; % Obs # of a central neighborhood
res2 = casetti(y,x,xc0,yc0,option);
prt(res2,vnames); % print the output
plt(res2,vnames); % graph the output
The default option is to implement an x-y expansion, which produces the result structure variable `res1'. The next case relies on a structure variable `option' to select distance expansion. The printed output is shown below. Both the base estimates as well as the expansion estimates are presented in the printed output. If you are working with a large model containing numerous observations, you can rely on the printing option that places the output in a file. Recall from Section 1.5, we need simply open an output file and input the `file-id' as an option to the prt function.
Another point to note regarding the printed output is that in the case of a large number of explanatory variables, the printed estimates will `wrap'. A set of estimates that take up 80 columns will be printed for all observations, and remaining estimates will be printed below for all observations. This `wrapping' will continue until all of the parameter estimates are printed.
Casetti X-Y Spatial Expansion Estimates
Dependent Variable = crime
R-squared = 0.6330
Rbar-squared = 0.5806
sige = 117.4233
Nobs, Nvars = 49, 3
***************************************************************
Base x-y estimates
Variable Coefficient t-statistic t-probability
const 69.496160 15.105146 0.000000
income -4.085918 -1.951941 0.057048
hse value 0.403956 0.517966 0.606965
x-income -0.046062 -1.349658 0.183731
x-hse value 0.026732 2.027587 0.048419
y-income 0.121440 2.213107 0.031891
y-hse value -0.048606 -2.341896 0.023571
***************************************************************
Expansion estimates
Obs# x-income x-hse value y-income y-hse value
1 -1.6407 0.9522 5.1466 -2.0599
2 -1.6813 0.9757 4.9208 -1.9695
3 -1.6909 0.9813 4.7009 -1.8815
4 -1.5366 0.8918 4.6645 -1.8670
5 -1.7872 1.0372 5.3519 -2.1421
6 -1.8342 1.0645 5.0009 -2.0016
7 -1.8429 1.0695 4.6147 -1.8470
8 -2.0152 1.1695 4.7702 -1.9092
9 -1.8245 1.0588 4.2395 -1.6968
10 -2.1930 1.2727 4.4228 -1.7702
11 -2.2377 1.2986 4.1848 -1.6750
12 -2.2851 1.3262 3.9650 -1.5870
13 -2.3082 1.3395 3.6323 -1.4538
14 -2.3602 1.3697 3.3760 -1.3512
15 -2.3441 1.3604 3.0651 -1.2268
16 -2.2312 1.2949 3.3918 -1.3576
17 -2.1525 1.2492 3.8752 -1.5510
18 -2.0009 1.1612 4.3621 -1.7459
19 -1.9977 1.1594 4.0634 -1.6264
20 -1.8945 1.0995 4.0245 -1.6108
21 -2.0244 1.1749 3.8387 -1.5364
22 -2.0313 1.1789 3.6918 -1.4776
23 -2.0129 1.1682 3.5436 -1.4183
24 -1.8904 1.0971 3.4950 -1.3989
25 -1.9913 1.1556 3.3165 -1.3274
26 -1.9655 1.1406 3.0311 -1.2132
27 -1.8982 1.1016 3.1453 -1.2589
28 -1.8112 1.0511 3.1392 -1.2565
29 -1.8927 1.0984 3.3384 -1.3362
30 -1.7651 1.0244 3.4999 -1.4008
31 -1.9028 1.1043 3.7525 -1.5019
32 -1.8130 1.0522 3.9929 -1.5982
33 -1.8296 1.0618 3.7209 -1.4893
34 -1.7637 1.0236 3.6857 -1.4752
35 -1.6859 0.9784 3.8970 -1.5598
36 -1.7319 1.0051 4.1387 -1.6565
37 -1.7103 0.9926 4.3864 -1.7556
38 -1.7434 1.0118 4.4083 -1.7644
39 -1.6559 0.9610 4.4204 -1.7693
40 -1.6453 0.9549 4.3233 -1.7304
41 -1.6472 0.9559 4.2091 -1.6847
42 -1.6651 0.9664 4.1192 -1.6487
43 -1.5698 0.9110 3.6942 -1.4786
44 -1.3966 0.8105 3.4319 -1.3736
45 -1.2870 0.7469 3.6250 -1.4509
46 -1.2561 0.7290 3.4258 -1.3712
47 -1.1170 0.6482 3.2412 -1.2973
48 -1.1732 0.6809 3.1222 -1.2497
49 -1.3367 0.7758 3.2279 -1.2919
Casetti Distance Spatial Expansion Estimates
Dependent Variable = crime
R-squared = 0.6307
Rbar-squared = 0.5878
sige = 112.7770
Nobs, Nvars = 49, 3
central obs = 20
***************************************************************
Base centroid estimates
Variable Coefficient t-statistic t-probability
const 62.349645 12.794160 0.000000
income -0.855052 -1.048703 0.299794
hse value -0.138951 -0.520305 0.605346
d-income -0.048056 -0.613545 0.542538
d-hse value -0.013384 -0.473999 0.637743
***************************************************************
Expansion estimates
Obs# income hse value
1 -0.5170 -0.1440
2 -0.4187 -0.1166
3 -0.3417 -0.0952
4 -0.4512 -0.1257
5 -0.5371 -0.1496
6 -0.3915 -0.1090
7 -0.2397 -0.0668
8 -0.3208 -0.0893
9 -0.1121 -0.0312
10 -0.3490 -0.0972
11 -0.3636 -0.1013
12 -0.4082 -0.1137
13 -0.4586 -0.1277
14 -0.5495 -0.1530
15 -0.6034 -0.1681
16 -0.4314 -0.1201
17 -0.2755 -0.0767
18 -0.1737 -0.0484
19 -0.1087 -0.0303
20 -0.0000 -0.0000
21 -0.1542 -0.0429
22 -0.1942 -0.0541
23 -0.2269 -0.0632
24 -0.2096 -0.0584
25 -0.2978 -0.0829
26 -0.4000 -0.1114
27 -0.3479 -0.0969
28 -0.3610 -0.1005
29 -0.2715 -0.0756
30 -0.2477 -0.0690
31 -0.1080 -0.0301
32 -0.0860 -0.0239
33 -0.1379 -0.0384
34 -0.1913 -0.0533
35 -0.2235 -0.0622
36 -0.1755 -0.0489
37 -0.2397 -0.0668
38 -0.2189 -0.0610
39 -0.2941 -0.0819
40 -0.2856 -0.0795
41 -0.2682 -0.0747
42 -0.2422 -0.0675
43 -0.3631 -0.1011
44 -0.5700 -0.1587
45 -0.6533 -0.1820
46 -0.7069 -0.1969
47 -0.8684 -0.2419
48 -0.8330 -0.2320
49 -0.6619 -0.1843
We turn attention to interpreting the output from this example. For this purpose we compare the `base' spatial expansion estimates to those from least-squares which are presented below. The addition of the four x-y expansion variables increased the fit of the model slightly as indicated by the higher adjusted R2 statistic. We see that the intercept estimate is relatively unaffected by the inclusion of expansion variables, but the coefficients on income and house value take on very different values. The significance of the income variable falls as indicated by the lower t-statistic, and the house value variable becomes insignificant.
Three of the four x-y expansion variables are significant at the 0.05 level, providing evidence that the influence of these variables on neighborhood crime varies over space. Keep in mind that depending on the amount of independent variation in the x-y coordinates, we may introduce a substantial amount of collinearity into the model when we add expansion variables. These are likely highly correlated with the base variables in the model, and this may account for the lack of significance of the house value variable in the `base model'. A statistical interpretation of these `base' estimates would be that income expansion in the x-direction (south-north) is not significant, whereas it is in the y-direction (west-east).
Ordinary Least-squares Estimates Dependent Variable = crime R-squared = 0.5521 Rbar-squared = 0.5327 sigma^2 = 130.8386 Durbin-Watson = 1.1934 Nobs, Nvars = 49, 3 *************************************************************** Variable Coefficient t-statistic t-probability const 68.609759 14.484270 0.000000 income -1.596072 -4.776038 0.000019 house value -0.274079 -2.655006 0.010858 Casetti X-Y Spatial Expansion Estimates Dependent Variable = crime R-squared = 0.6330 Rbar-squared = 0.5806 sige = 117.4233 Nobs, Nvars = 49, 3 *************************************************************** Base x-y estimates Variable Coefficient t-statistic t-probability const 69.496160 15.105146 0.000000 income -4.085918 -1.951941 0.057048 hse value 0.403956 0.517966 0.606965 x-income -0.046062 -1.349658 0.183731 x-hse value 0.026732 2.027587 0.048419 y-income 0.121440 2.213107 0.031891 y-hse value -0.048606 -2.341896 0.023571 Casetti Distance Spatial Expansion Estimates Dependent Variable = crime R-squared = 0.6307 Rbar-squared = 0.5878 sige = 112.7770 Nobs, Nvars = 49, 3 central obs = 20 *************************************************************** Base centroid estimates Variable Coefficient t-statistic t-probability const 62.349645 12.794160 0.000000 income -0.855052 -1.048703 0.299794 hse value -0.138951 -0.520305 0.605346 d-income -0.048056 -0.613545 0.542538 d-hse value -0.013384 -0.473999 0.637743
In order to interpret the x-y expansion estimates, we need to keep in mind that the x-direction reflects the south-north direction with larger values of xc indicating northward movement. Similarly, larger values for yc reflect west-east movement. Using these interpretations, the base model estimates indicate that income exerts an insignificant negative influence on crime as we move from south to north. Considering the y-direction representing west-east movement, we find that income exerts a positive influence as we move in the easterly direction. One problem with interpreting the expansion estimates is that the base model coefficient is -4.085, indicating that income exerts a negative influence on crime. It is difficult to assess the total impact on neighborhood crime from both the base model coefficient representing non-spatial impact plus the small 0.121 value for the expanded coefficient reflecting the impact of spatial variation.
If we simply plotted the expansion coefficients, they would suggest that income in the y-direction has a positive influence on crime, a counterintuitive result. This is shown in the plot of the expanded coefficients sorted by the south-north and east-west directions shown in Figure 4.1. We are viewing a positive coefficient on the y-income variable in the graph.
Figure 4.2 shows a graph of the total impact of income on the dependent variable crime that takes into account both the base model non-spatial impact plus the spatial impact indicated by the expansion coefficient. Here we see that the total impact of income on crime is negative, except for neighborhoods in the extreme east at the right of the graph in the lower left-hand corner of Figure 4.2. The coefficient graphs produced using the plt function on the results structure from casetti are identical to those shown in Figure 4.2. You can of course recover the spatial expansion estimates from the results structure returned by casetti, sort the estimates in the x-y directions and produce your own plots of just the expansion estimates if this is of interest. As an example, the following code would produce this type of graph, where we are assuming the existence of a structure `result' returned by casetti.
[xcs xci] = sort(result.xc); [ycs yci] = sort(result.yc); beta = result.beta; [nobs nvar] = size(beta); nvar = nvar/2; betax = beta(xci,1:nvar); % sort estimates betay = beta(yci,nvar+1:2*nvar); tt=1:nobs; for j=1:nvar plot(tt,betax(:,j)); pause; end; for j=1:nvar plot(tt,betay(:,j)); pause; end;
The distance expansion method produced a similar fit to the data as indicated by the adjusted R2 statistic. This is true despite the fact that only the constant term is statistically significant at conventional levels.
These estimates take on a value of zero for the central observation since the distance is zero at that point. This makes them somewhat easier to interpret than the estimates for the case of x-y coordinates. We know that the expansion coefficients take on values of zero at the central point, producing estimates based on the non-spatial base coefficients for this point. As we move outward from the center, the expansion estimates take over and adjust the constant coefficients to account for variation over space. The printed distance expansion estimates reported for the base model reflect values near the distance-weighted average of all points in space. Given this, we would interpret the coefficients for the model at the central point to be (62.349, -0.855, -0.138 ) for the intercept, income and house value variables respectively. In Figure 4.3 we see the total impact of income and house values on neighborhood crime as we move away from the central point. Both income and house values have a negative effect on neighborhood crime as we move away from the central city. Note that this is quite different from the pattern shown for the x-y expansion. Anselin (1988) in analyzing the x-y model shows that heteroscedastic disturbances produce problems that plague inferences for the model. Adjusting for the heteroscedastic disturbances dramatically alters the inferences. We turn attention to this issue when we discuss the DARP version of this model in Section 4.2.
The plots of the coefficient estimates provide an important source of information about the nature of coefficient variation over space, but you should keep in mind that they do not indicate levels of significance, simply point estimates.
Our plt wrapper function works to call the appropriate function plt_cas that provides individual graphs of each coefficient in the model as well as a two-part graph showing actual versus predicted and residuals. Figure 4.4 shows the actual versus predicted and residuals from the distance expansion model. This plot is produced by the plt function when given a results structure from the casetti function.
A problem with the spatial expansion model is that heteroscedasticity is inherent in the way the model is constructed. To see this, consider the slightly altered version of the distance expansion model shown in (4.10), where we have added a stochastic term u to reflect some error in the expansion relationship.
Now consider substituting the second equation from (4.10) into the first, producing:
It should be clear that the new composite disturbance term X u + e will reflect heteroscedasticity unless the expansion relationship is exact and u=0.
Casetti (1982) and Casetti and Can (1998) propose a model they label DARP, an acronym for Drift Analysis of Regression Parameters, that aims at solving this problem. This model case be viewed as an extended expansion model taking the form:
Where
represents the expansion relationship based on a function f,
variables Z and parameters
.
Estimation of this model attempts to take
into account that the expanded model will have a heteroscedastic error as shown
in (4.11).
To keep our discussion concrete, we will rely on
,
the distance expansion relationship in discussing this model. In order to take
account of the spatial heteroscedasticity an explicit model of the composite
disturbance term:
,
is incorporated during estimation.
This disturbance is assumed to have a variance structure that can be represented
in alternative ways shown below. We will rely on these alternative scalar and
matrix representations in the mathematical development and explanation of the
model.
Where di denotes the squared distance between the ith observation and the
central point, and
are parameters to be
estimated. Of course, a more general statement of the model would be that
,
indicating that any functional form g
involving some known variable hi and associated parameters
could
be employed to specify the non-constant variance over space.
Note that the alternative specifications in (4.13) imply:
,
the constant scalar component
of the variance structure, and
reflects the non-constant component which is modeled as a function
of distance from the central point.
An intuitive motivation for this type of variance structure is based on
considering the nature of the composite disturbance: X u + e. The constant
scalar
reflects the constant component e, while the role of the
scalar parameter
associated with distance is to measure the impact
of Xu, the non-constant variance component on average. Somewhat loosely,
consider a linear regression involving the residuals on a constant term plus a
vector of distances from the central place. The constant term estimate would
reflect
,
while the distance coefficient is
intended to capture the influence of the non-constant Xu component:
.
If
,
we have
,
a constant scalar
value across all observations in space. This would indicate a situation
where u, the error made in the expansion specification is small. This
homoscedastic case would indicate that a simple deterministic spatial
expansion specification for spatial coefficient variation is performing well.
On the other hand, if
,
moving away from the
central point produces a positive increase in the variance. This is interpreted
as evidence that `parameter drift' is present in the relationship being modeled.
The motivation is that increasing variance would indicate larger errors
(u) in the expansion relationship as we move away from the central point.
Note that one can assume a value for
rather than estimate this
parameter. If you impose a positive value, you are assuming a DARP model that
will generate locally linear estimates since movement in the parameters will
increase with movement away from the central point. This is because allowing
increasing variance in the stochastic component of the expansion relation brings
about more rapid change or adjustment in the parameters. Another way to view
this is that the change in parameters need not adhere as strictly to the
deterministic expansion specification. We will argue in Section 4.4
that a Bayesian approach to this type of specification is more intuitively
appealing.
Negative values for
suggest that the errors made
by the deterministic expansion specification are smaller as we move away from
the central point. This indicates that the expansion relation works well for
points farther from the center, but not as well for the central area
observations. Casetti and Can (1998) interpret this as suggesting
``differential performance'' of the base model with movement over space, and they
label this phenomenon as `performance drift'. The intuition here is most likely
based on the fact that the expansion relationship is of a locally linear nature.
Given this, better performance with distance is indicative of a need to change
the deterministic expansion relationship to improve performance. Again,
I will argue that a Bayesian model represents a more intuitively appealing
way to deal with these issues in Section 4.4.
Estimation of the parameters of the model require either feasible generalized
least squares (FGLS) or maximum likelihood (ML) methods. Feasible generalized
least squares obtains a consistent estimate of the unknown parameter
and then proceeds to estimate the remaining parameters in the model conditional
on this estimate.
As an example, consider using least-squares to estimate the expansion model and
associated residuals,
.
We could then carry out a
regression of the form:
| (4.14) |
Casetti and Can (1998) argue that the estimate
from this procedure
would be consistent. Given this estimate
and our knowledge of the distances in the vector d, we construct
an estimate of
that we label
.
Using
,
generalized least-squares produces:
| = | |||
| = |
Of course, the usual GLS variance-covariance matrix for the estimates applies:
Casetti and Can (1998) also suggest using a statistic:
which is chi-squared distributed with one degree of freedom
to test the null hypothesis that
.
Maximum likelihood estimation involves using optimization routines to solve for a minimum of the negative of the log-likelihood function. We have already seen how to solve optimization problems in the context of spatial autoregressive models in Chapter 2. We will take the same approach for this model. The log-likelihood is shown in (4.16) where we use a generic . to represents arguments on which this likelihood is conditioned.
As in Chapter 2 we can construct a MATLAB function to evaluate the
negative of this log-likelihood function and rely on our function maxlik.
The asymptotic variance-covariance matrix for the estimates
is equal to
that for the FGLS estimates shown in (4.15). The asymptotic
variance-covariance matrix for the parameters
is given by:
In the case of maximum likelihood estimates, a Wald statistic based on
that has a chi-squared distribution with one
degree of freedom can be used to test the null hypothesis that
.
Note that the maximum likelihood estimate of
is more efficient than
the FGLS estimate. This can be seen by comparing the ML estimate's asymptotic
variance of
,
to that for the FGLS which equals
.
Bear in mind, tests regarding the parameter
are quite
often the focus of this methodology as it provides exploratory evidence
regarding `performance drift' versus `parameter drift', so increased precision
regarding this parameter may be important.
The function darp implements this estimation procedure using the maxlik function from the Econometrics Toolbox to find estimates via maximum likelihood. The documentation for darp is:
PURPOSE: computes Casetti's DARP model
---------------------------------------------------
USAGE: results = darp(y,x,xc,yc,option)
where: y = dependent variable vector
x = independent variables matrix
xc = latitude (or longitude) coordinate
yc = longitude (or latitude) coordinate
option = a structure variable containing options
option.exp = 0 for x-y expansion (default)
= 1 for distance from ctr expansion
option.ctr = central point observation # for distance expansion
option.iter = # of iterations for maximum likelihood routine
option.norm = 1 for isotropic x-y normalization (default=0)
---------------------------------------------------
RETURNS:
results.meth = 'darp'
results.b0 = bhat (underlying b0x, b0y)
results.t0 = t-stats (associated with b0x, b0y)
results.beta = spatially expanded estimates (nobs x nvar)
results.yhat = yhat
results.resid = residuals
results.sige = e'*e/(n-k)
results.rsqr = rsquared
results.rbar = rbar-squared
results.nobs = nobs
results.nvar = # of variables in x
results.y = y data vector
results.xc = xc
results.yc = yc
results.ctr = ctr (if input)
results.dist = distance vector
results.exp = exp input option
results.norm = norm input option
results.iter = # of maximum likelihood iterations
--------------------------------------------------
NOTE: assumes x(:,1) contains a constant term
--------------------------------------------------
Because we are relying on maximum likelihood estimation which may not converge, we provide FGLS estimates as output in the event of failure. A message is printed to the MATLAB command window indicating that this has occurred. We rely on the FGLS estimates to provide starting values for the maxlik routine, which speeds up the optimization process.
The DARP model can be invoked with either the x-y or distance expansion as in the case of the spatial expansion model. Specifically, for x-y expansion the variance specification is based on:
| (4.18) |
This generalizes the distance expansion approach presented previously.
Of course, we have an accompanying prt and plt function to provide printed and graphical presentation of the estimation results.
Example 4.2 shows how to use the function darp for both x-y and distance expansion using the Columbus neighborhood crime data set.
% ----- example 4.2 Using the darp() function
% load Anselin (1988) Columbus neighborhood crime data
load anselin.dat; y = anselin(:,1); n = length(y);
x = [ones(n,1) anselin(:,2:3)];
xc = anselin(:,4); yc = anselin(:,5); % Anselin x-y coordinates
vnames = strvcat('crime','const','income','hse value');
% do Casetti darp using x-y expansion
res1 = darp(y,x,xc,yc);
prt(res1,vnames); % print the output
plt(res1,vnames); % plot the output
pause;
% do Casetti darp using distance expansion from observation #20
option.exp = 1; option.ctr = 20;
res2 = darp(y,x,xc,yc,option);
prt(res2,vnames); % print the output
plt(res2,vnames); % plot the output
The printed results are shown below, where we report not only the estimates for
and the expansion estimates, but estimates for the parameters
as well. A chi-squared statistic to test the null hypothesis that
is provided as well as a marginal probability level. For the
case of the x-y expansion, we see that
parameter is negative and
significant by virtue of the large chi-squared statistic and associated marginal
probability level of 0.0121. The inference we would draw is that performance
drift occurs in the south-north direction. For the
parameter, we find
a positive value that is not significantly different from zero because of the
marginal probability level of 0.8974. This indicates that the simple
deterministic expansion relationship is working well in the west-east direction.
Note that these results conform to those found with the spatial expansion model,
where we indicated that parameter variation in the west-east direction was
significant, but not for the south-north.
DARP X-Y Spatial Expansion Estimates
Dependent Variable = crime
R-squared = 0.6180
Rbar-squared = 0.5634
sige = 122.2255
gamma1,gamma2 = -0.0807, 0.0046
gamma1, prob = 6.2924, 0.0121
gamma2, prob = 0.0166, 0.8974
# of iterations = 16
log-likelihood = -181.3901
Nobs, Nvars = 49, 3
***************************************************************
Base x-y estimates
Variable Coefficient t-statistic t-probability
const 66.783527 6.024676 0.000000
income -2.639184 -0.399136 0.691640
hse value 0.249214 0.095822 0.924078
x-income -0.048337 -0.537889 0.593247
x-hse value 0.021506 0.640820 0.524819
y-income 0.084877 0.564810 0.574947
y-hse value -0.037460 -0.619817 0.538436
***************************************************************
Expansion estimates
Obs# x-income x-hse value y-income y-hse value
1 -1.3454 0.6595 4.0747 -1.6828
2 -1.3786 0.6758 3.8959 -1.6089
3 -1.3865 0.6796 3.7218 -1.5370
4 -1.2600 0.6176 3.6930 -1.5251
5 -1.4655 0.7183 4.2372 -1.7499
6 -1.5040 0.7372 3.9593 -1.6351
7 -1.5112 0.7407 3.6536 -1.5088
8 -1.6524 0.8100 3.7766 -1.5597
9 -1.4961 0.7333 3.3565 -1.3862
10 -1.7982 0.8814 3.5017 -1.4461
11 -1.8349 0.8994 3.3132 -1.3683
12 -1.8738 0.9185 3.1392 -1.2964
13 -1.8926 0.9277 2.8757 -1.1876
14 -1.9353 0.9487 2.6729 -1.1038
15 -1.9221 0.9422 2.4267 -1.0022
16 -1.8296 0.8968 2.6854 -1.1090
17 -1.7650 0.8652 3.0680 -1.2670
18 -1.6407 0.8042 3.4536 -1.4263
19 -1.6381 0.8029 3.2171 -1.3286
20 -1.5535 0.7615 3.1863 -1.3159
21 -1.6600 0.8137 3.0392 -1.2551
22 -1.6657 0.8165 2.9229 -1.2071
23 -1.6505 0.8091 2.8056 -1.1586
24 -1.5501 0.7598 2.7671 -1.1428
25 -1.6328 0.8004 2.6258 -1.0844
26 -1.6116 0.7900 2.3998 -0.9911
27 -1.5565 0.7630 2.4902 -1.0284
28 -1.4851 0.7280 2.4854 -1.0264
29 -1.5520 0.7607 2.6431 -1.0915
30 -1.4473 0.7095 2.7709 -1.1443
31 -1.5603 0.7648 2.9709 -1.2269
32 -1.4866 0.7287 3.1613 -1.3055
33 -1.5002 0.7354 2.9459 -1.2166
34 -1.4462 0.7089 2.9181 -1.2051
35 -1.3824 0.6776 3.0853 -1.2742
36 -1.4201 0.6961 3.2767 -1.3532
37 -1.4024 0.6874 3.4728 -1.4342
38 -1.4296 0.7008 3.4901 -1.4413
39 -1.3578 0.6656 3.4997 -1.4453
40 -1.3491 0.6613 3.4228 -1.4136
41 -1.3507 0.6621 3.3324 -1.3762
42 -1.3654 0.6693 3.2613 -1.3468
43 -1.2872 0.6310 2.9248 -1.2079
44 -1.1452 0.5613 2.7171 -1.1221
45 -1.0553 0.5173 2.8700 -1.1852
46 -1.0300 0.5049 2.7123 -1.1201
47 -0.9159 0.4490 2.5662 -1.0598
48 -0.9620 0.4715 2.4719 -1.0209
49 -1.0961 0.5373 2.5556 -1.0554
DARP Distance Expansion Estimates
Dependent Variable = crime
R-squared = 0.6083
Rbar-squared = 0.5628
sige = 119.6188
gamma = -0.0053
gamma, prob = 0.0467, 0.8289
# of iterations = 10
log-likelihood = -138.64471
Nobs, Nvars = 49, 3
central obs = 20
***************************************************************
Base centroid estimates
Variable Coefficient t-statistic t-probability
const 62.323508 5.926825 0.000000
income -0.889528 -0.925670 0.359448
hse value -0.312813 -1.015500 0.315179
d-income -0.004348 -0.790909 0.433056
d-hse value 0.000659 0.349488 0.728318
***************************************************************
Expansion estimates
Obs# income hse value
1 -0.4032 0.0055
2 -0.2644 0.0036
3 -0.1761 0.0024
4 -0.3070 0.0042
5 -0.4350 0.0060
6 -0.2311 0.0032
7 -0.0866 0.0012
8 -0.1552 0.0021
9 -0.0190 0.0003
10 -0.1837 0.0025
11 -0.1994 0.0027
12 -0.2513 0.0034
13 -0.3172 0.0043
14 -0.4554 0.0062
15 -0.5492 0.0075
16 -0.2807 0.0038
17 -0.1145 0.0016
18 -0.0455 0.0006
19 -0.0178 0.0002
20 -0.0000 0.0000
21 -0.0359 0.0005
22 -0.0569 0.0008
23 -0.0776 0.0011
24 -0.0662 0.0009
25 -0.1338 0.0018
26 -0.2413 0.0033
27 -0.1826 0.0025
28 -0.1965 0.0027
29 -0.1112 0.0015
30 -0.0925 0.0013
31 -0.0176 0.0002
32 -0.0111 0.0002
33 -0.0287 0.0004
34 -0.0552 0.0008
35 -0.0753 0.0010
36 -0.0465 0.0006
37 -0.0867 0.0012
38 -0.0723 0.0010
39 -0.1305 0.0018
40 -0.1230 0.0017
41 -0.1085 0.0015
42 -0.0885 0.0012
43 -0.1989 0.0027
44 -0.4900 0.0067
45 -0.6437 0.0088
46 -0.7538 0.0103
47 -1.1374 0.0156
48 -1.0465 0.0143
49 -0.6607 0.0090
For the case of the distance expansion we find a single
parameter that
is negative but not significant. This would be interpreted to mean that the
deterministic expansion relationship is performing adequately over
space.
A comparison of the base model estimates from the x-y darp model versus those from casetti show relatively similar coefficient estimates as we would expect. In the case of the x-y expansion all of the signs are the same for spatial expansion and darp models. The distance expansion version of the model exhibits a sign change for the coefficient on income, which goes from positive in the expansion model to negative in the darp model. Correcting for the heteroscedastic character of the estimation problem produces dramatic changes in the statistical significance found for the base model estimates. They all become insignificant, a finding consistent with results reported by Anselin (1988) based on a jacknife approach to correcting for heteroscedasticity in this model. Consistent with our results, Anselin finds that the income coefficient is only marginally significant after the correction.
One approach to using this model is to expand around every point in
space and examine the parameters
for evidence indicating
where the model is suffering from performance or parameter drift.
Example 4.3 shows how this might be accomplished using a `for loop'
over all observations. For this purpose, we wish only to recover the
estimated values for the parameter
along with the marginal
probability level.
% ----- example 4.3 Using darp() over space
% load Anselin (1988) Columbus neighborhood crime data
load anselin.dat; y = anselin(:,1); n = length(y);
x = [ones(n,1) anselin(:,2:3)];
xc = anselin(:,4); yc = anselin(:,5); % Anselin x-y coordinates
vnames = strvcat('crime','const','income','hse value');
% do Casetti darp using distance expansion from all
% observations in the data sample
option.exp = 1;
output = zeros(n,2);
tic;
for i=1:n % loop over all observations
option.ctr = i;
res = darp(y,x,xc,yc,option);
output(i,1) = res.gamma(1);
output(i,2) = res.cprob(1);
end;
toc;
in.cnames = strvcat('gamma estimate','marginal probability');
in.rflag = 1;
mprint(output,in)
We use the MATLAB `tic' and `toc' commands to time the operation of producing these maximum likelihood estimates across the entire sample. The results are shown below, where we find that it took around 70 seconds to solve the maximum likelihood estimation problem, calculate expansion estimates and produce all of the ancillary statistics 49 times, once for each observation in the data set.
elapsed_time = 69.2673
Obs# gamma estimate marginal probability
1 -0.2198 0.0714
2 -0.2718 0.0494
3 -0.3449 0.0255
4 -0.4091 0.0033
5 -0.2223 0.0532
6 -0.3040 0.0266
7 -0.4154 0.0126
8 -0.2071 0.1477
9 -0.5773 0.0030
10 0.1788 0.1843
11 0.1896 0.1526
12 0.1765 0.1621
13 0.1544 0.1999
14 0.1334 0.2214
15 0.1147 0.2708
16 0.1429 0.2615
17 0.1924 0.2023
18 -0.1720 0.3112
19 0.1589 0.3825
20 -0.3471 0.0810
21 0.2020 0.2546
22 0.1862 0.2809
23 0.1645 0.3334
24 0.0904 0.6219
25 0.1341 0.4026
26 0.1142 0.4264
27 0.1150 0.4618
28 0.0925 0.5584
29 0.1070 0.5329
30 -0.2765 0.1349
31 0.0453 0.8168
32 -0.6580 0.0012
33 -0.3293 0.0987
34 -0.5949 0.0024
35 -0.8133 0.0000
36 -0.5931 0.0023
37 -0.4853 0.0066
38 -0.4523 0.0121
39 -0.5355 0.0016
40 -0.6050 0.0005
41 -0.6804 0.0001
42 -0.7257 0.0001
43 -0.7701 0.0000
44 -0.5150 0.0001
45 -0.3997 0.0005
46 -0.3923 0.0003
47 -0.3214 0.0004
48 -0.3586 0.0001
49 -0.4668 0.0000
From the estimated values of
and the associated marginal probabilities,
we infer that the model suffers from performance drift over the initial 9
observations and sample observations from 32 to 49. We draw this inference from
the negative
estimates that are statistically significant for these
observations. (Note that observation #8 is only marginally significant.) Over
the middle range of the sample, from observations 10 to 31 we find that the
deterministic distance expansion relationship works well. This inference arises
from the fact that none of the estimated
parameters are significantly
different from zero.
In Section 4.4 we provide evidence that observations 2 and 4 represent outliers that impact on estimates for neighboring observations 1 to 9. We also show that this is true of observation 34, which influences observations 31 to 44. This suggests that the DARP model is working correctly to spot places where the model encounters problems.
These models represent an attempt to draw on the flexibility and tractability of non-parametric estimators. Note that the use of the spatial expansion and DARP methods in the previous section did not involve matrix manipulations or inversion of large sparse matrices. The models presented in this section share that advantage over the spatial autoregressive models.
Locally linear regression methods introduced in McMillen (1996,1997) and labeled geographically-weighted regressions (GWR) in Brunsdon, Fotheringham and Charlton (1996) (BFC hereafter) are discussed in this section. The main contribution of the GWR methodology is the use of distance-weighted sub-samples of the data to produce locally linear regression estimates for every point in space. Each set of parameter estimates is based on a distance-weighted sub-sample of ``neighboring observations'', which has a great deal of intuitive appeal in spatial econometrics. While this approach has a definite appeal, it also presents some problems that are discussed in Section 4.4, where a Bayesian approach is used to overcome the problems.
The distance-based weights used by BFC for data at observation i take the form of a vector Wi which is determined based on a vector of distances di between observation i and all other observations in the sample. This distance vector along with a distance decay parameter are used to construct a weighting function that places relatively more weight on sample observations from neighboring observations in the spatial data sample.
A host of alternative approaches have been suggested for constructing the weight function. One approach suggested by BFC is:
The parameter
is a decay parameter that BFC label ``bandwidth''.
Changing the bandwidth results in a different exponential decay profile, which
in turn produces estimates that vary more or less rapidly over space.
Another weighting scheme is the tri-cube function proposed
by McMillen (1998):
Where qi represents the distance of the qth nearest neighbor
to observation i and
is an indicator function
that equals one when the condition is true and zero
otherwise. Still another approach is to rely on a Gaussian function
:
Where
denotes the standard normal density and
represents the standard deviation of the distance vector di.
The distance vector is calculated for observation i as:
| (4.22) |
Where
Zxj,Zyj denote the
latitude-longitude coordinates of the observations
.
Note that the notation used here may be confusing as we usually rely on subscripted variables to denote scalar elements of a vector. In the notation used here, the subscripted variable di represents a vector of distances between observation i and all other sample data observations. Similarly, the Wi is a vector of distance-based weights associated with observation i.
BFC use a single value of
,
the bandwidth parameter for all observations
determined using a cross-validation procedure that is often used in locally
linear regression methods. A score function taking the form:
is used, where
denotes the fitted value of yi with
the observations for point i omitted from the calibration process. A value of
that minimizes this score function is used as the distance-weighting
bandwidth to produce GWR estimates.
The non-parametric GWR model relies on a sequence of locally linear regressions
to produce estimates for every point in space by using a sub-sample of data
information from nearby observations. Let y denote an nx1 vector of
dependent variable observations collected at n points in space, X an nxk
matrix of explanatory variables, and
an nx1 vector of normally
distributed, constant variance disturbances. Letting Wi represent an
nxn diagonal matrix containing distance-based weights for observation i
that reflect the distance between observation i and all other observations,
we can write the GWR model as:
The subscript i on
indicates that this kx1 parameter
vector is associated with observation i. The GWR model produces n
such vectors of parameter estimates, one for each observation. These
estimates are produced using:
Keep in mind the notation, Wi y denotes an n-vector of
distance-weighted observations used to produce estimates for observation i.
Note also, that Wi X represents a distance-weighted data matrix, not a
single observation and
represents an n-vector.
Note that these GWR estimates for
are conditional on the parameter
.
That is, changing
,
the distance decay parameter will produce
a different set of GWR estimates.
The best way to understand this approach to dealing with spatial heterogeneity is to apply the method, a subject to which we turn in the next section.
We have an optimization problem to solve regarding minimizing the score function
to find the cross-validation bandwidth parameter
.
We first construct a
MATLAB function to compute the scores associated with different bandwidths.
This univariate function of the scalar bandwidth parameter can then be minimized
using the simplex algorithm fmin.
Given the optimal bandwidth, estimation of the GWR parameters
and
associated statistics can proceed via generalized least-squares. A function
gwr whose documentation is shown below implements the estimation
procedure.
PURPOSE: compute geographically weighted regression
----------------------------------------------------
USAGE: results = gwr(y,x,east,north,info)
where: y = dependent variable vector
x = explanatory variable matrix
east = x-coordinates in space
north = y-coordinates in space
info = a structure variable with fields:
info.bwidth = scalar bandwidth to use or zero
for cross-validation estimation (default)
info.dtype = 'gaussian' for Gaussian weighting (default)
= 'exponential' for exponential weighting
NOTE: res = gwr(y,x,east,north) does CV estimation of bandwidth
---------------------------------------------------
RETURNS: a results structure
results.meth = 'gwr'
results.beta = bhat matrix (nobs x nvar)
results.tstat = t-stats matrix (nobs x nvar)
results.yhat = yhat
results.resid = residuals
results.sige = e'e/(n-dof) (nobs x 1)
results.nobs = nobs
results.nvar = nvars
results.bwidth = bandwidth
results.dtype = input string for Gaussian, exponential weights
results.iter = # of simplex iterations for cv
results.north = north (y-coordinates)
results.east = east (x-coordinates)
results.y = y data vector
---------------------------------------------------
NOTES: uses auxiliary function scoref for cross-validation
---------------------------------------------------
The following program illustrates using the gwr function on the Anselin (1988) neighborhood crime data set to produce estimates based on both Gaussian and exponential weighting functions. Figure 4.5 shows a graph of these two sets of estimates, indicating that they are not very different.
% ----- example 4.4 Using the gwr() function
% load the Anselin data set
load anselin.dat; y = anselin(:,1); nobs = length(y);
x = [ones(nobs,1) anselin(:,2:3)]; tt=1:nobs;
north = anselin(:,4); east = anselin(:,5);
info.dtype = `gaussian'; % Gaussian weighting function
res1 = gwr(y,x,east,north,info);
info.dtype = `exponential'; % Exponential weighting function
res2 = gwr(y,x,east,north,info);
subplot(3,1,1), plot(tt,res1.beta(:,1),tt,res2.beta(:,1),'--');
legend('Gaussian','Exponential'); ylabel('Constant term');
subplot(3,1,2), plot(tt,res1.beta(:,2),tt,res2.beta(:,2),'--');
legend('Gaussian','Exponential'); ylabel('Household income');
subplot(3,1,3), plot(tt,res1.beta(:,3),tt,res2.beta(:,3),'--');
legend('Gaussian','Exponential'); ylabel('House value');
The printed output for these models is voluminous as illustrated below, where we only print estimates associated with two observations.
Geometrically weighted regression estimates Dependent Variable = crime R-squared = 0.9418 Rbar-squared = 0.9393 Bandwidth = 0.6518 Decay type = gaussian # iterations = 17 Nobs, Nvars = 49, 3 *************************************** Obs = 1, x-coordinate= 42.3800, y-coordinate= 35.6200, sige= 1.1144 Variable Coefficient t-statistic t-probability const 51.198618 16.121809 0.000000 income -0.461074 -2.938009 0.005024 hse value -0.434240 -6.463775 0.000000 Obs = 2, x-coordinate= 40.5200, y-coordinate= 36.5000, sige= 2.7690 Variable Coefficient t-statistic t-probability const 63.563830 15.583144 0.000000 income -0.369869 -1.551568 0.127201 hse value -0.683562 -7.288304 0.000000
A Bayesian treatment of locally linear geographically weighted regressions is set forth in this section. While the use of locally linear regression seems appealing, it is plagued by some problems. A Bayesian treatment can resolve these problems and has some advantages over the non-parametric approach discussed in Section 4.3.
One problem with the non-parametric approach is that valid inferences cannot be drawn for the GWR regression parameters. To see this, consider that the locally linear estimates use the same sample data observations (with different weights) to produce a sequence of estimates for all points in space. Given a lack of independence between estimates for each location, conventional measures of dispersion for the estimates will likely be incorrect. These (invalid) conventional measures are what we report in the results structure from gwr, as this is the approach taken by Brunsdon, Fotheringham and Charlton (1996).
Another problem is that non-constant variance over space, aberrant observations due to spatial enclave effects, or shifts in regime can exert undue influence on locally linear estimates. Consider that all nearby observations in a sub-sequence of the series of locally linear estimates may be ``contaminated'' by an outlier at a single point in space. The Bayesian approach introduced here solves this problem by robustifying against aberrant observations. Aberrant observations are automatically detected and downweighted to lessen their influence on the estimates. The non-parametric implementation of the GWR model assumed no outliers.
A third problem is that the non-parametric estimates may suffer from ``weak data'' problems because they are based on a distance weighted sub-sample of observations. The effective number of observations used to produce estimates for some points in space may be very small. This problem can be solved with the Bayesian approach by incorporating subjective prior information during estimation. Use of subjective prior information is a well-known approach for overcoming ``weak data'' problems.
In addition to overcoming these problems, the Bayesian formulation introduced here specifies the relationship that is used to smooth parameters over space. This allows us to subsume the non-parametric GWR method as part of a much broader class of spatial econometric models. For example, the Bayesian GWR can be implemented with parameter smoothing relationships that result in: 1) a locally linear variant of the spatial expansion methods discussed in section 4.1, 2) a parameter smoothing relation appropriate for monocentric city models where parameters vary systematically with distance from the center of the city, 3) a parameter smoothing scheme based on contiguity relationships, and 4) a parameter smoothing scheme based on distance decay.
The Bayesian approach, which we label BGWR is best described using matrix expressions shown in (4.26) and (4.27). First, note that (4.26) is the same as the non-parametric GWR relationship, but the addition of (4.27) provides an explicit statement of the parameter smoothing that takes place across space. The parameter smoothing involves a locally linear combination of neighboring areas, where neighbors are defined in terms of the distance weighting function that decays over space.
To complete our model specification, we add distributions for the
terms
and ui:
The distribution for ui in the parameter smoothing
relationship is normally distributed with mean zero and a variance based on
Zellner's (1971) g-prior. This prior variance is proportional to the
parameter variance-covariance matrix,
with
acting as the scale factor. The use of this prior
specification allows individual parameters
to vary by different
amounts depending on their magnitude.
The parameter
acts as a scale factor to impose tight or loose
adherence to the parameter smoothing specification. Consider a case where
is very small, then the smoothing restriction would force
to look like a distance-weighted linear combination of other
from neighboring observations. On the other
hand, as
(and
Vi=In) we produce the
non-parametric GWR estimates. To see this, we rewrite the BGWR
model in a more compact form:
| = | Wi1/2 y | ||
| = | Wi1/2 X | ||
| Ji | = | ||
| = | ![]() |
As indicated earlier, the notation is somewhat confusing in that
denotes an n-vector, not a scalar magnitude. Similarly,
is
an n-vector and
is an n by k matrix. Note that (4.30)
can be written in the form of a Theil and Goldberger (1961) estimation problem as
shown in (4.32).
Assuming
Vi=In, the estimates
take the
form:
| = | |||
| R | = |
As
approaches
,
the terms associated with the Theil and
Goldberger ``stochastic restriction'',
and
become zero,
and we have the GWR estimates:
| (4.32) |
In practice, we can use a diffuse prior for
which allows the
amount of parameter smoothing to be estimated from sample data
information, rather than by subjective prior information.
Details concerning estimation of the parameters in the BGWR model are taken up in the next section. Before turning to these issues, we consider some alternative spatial parameter smoothing relationships that might be used in lieu of (4.27) in the BGWR model.
One alternative smoothing specification would be the ``monocentric city smoothing'' set forth in (4.34). This relation assumes that the data observations have been ordered by distance from the center of the spatial sample.
Given that the observations are ordered by distance from the center,
the smoothing relation indicates that
should be similar to
the coefficient
from a neighboring concentric ring.
Note that we rely on the same GWR distance-weighted data sub-samples,
created by transforming the data using:
Wiy, WiX. This means
that the estimates still have a ``locally linear'' interpretation as
in the GWR. We rely on the same distributional assumption for the term
ui from the BGWR which allows us to estimate the
parameters from this model by making minor changes to the approach
used for the BGWR.
Another alternative is a ``spatial expansion smoothing'' based on the ideas introduced by Casetti (1972). This is shown in (4.35), where Zxi, Zyi denote latitude-longitude coordinates associated with observation i.
This parameter smoothing relation creates a locally linear combination based on the latitude-longitude coordinates of each observation. As in the case of the monocentric city specification, we retain the same assumptions regarding the stochastic term ui, making this model simple to estimate with minor changes to the BGWR methodology.
Finally, we could adopt a ``contiguity smoothing'' relationship based on a first-order spatial contiguity matrix as shown in (4.36). The terms cij represent the ith row of a row-standardized first-order contiguity matrix. This creates a parameter smoothing relationship that averages over the parameters from observations neighboring observation i.