Modelling population density over time: how spatial distance matters

ABSTRACT Modelling population density over time: how spatial distance matters. Regional Studies. This study provides an empirical application of the Bayesian approach for modelling the evolution of population density distribution across time. It focuses on the case of Massachusetts by tracking changes in the importance of spatial distance from Boston concerning citizens’ choices of residence according to data for 1880–90 and 1930–2010. By adopting a Bayesian strategy, results show that Boston reinforced its attractiveness until the 1960s, when the city's accessibility no longer represented the unique determinant of population density distribution. Referring to selected historical evidence, a few possible interpretations are presented to endorse these results.


INTRODUCTION
Individual preferences for residential location rely on a combination of a few determinants ranging from individual-specific characteristics to predetermined neighbourhood characteristics and including subjective beliefs about the behaviour of other individuals (Durlauf, 2004).
In the literature (e.g., Parr, 1985aParr, , 1985b, when individuals prefer one location over other options, their distribution of choices across space is not uniform; for instance, population density is not identical in urban and rural areas. The distribution of population density is empirically measured as the combination between the value of population density at a given location and its distance from a selected point usually identified as the central business district (CBD), which usually includes important sites for people's professional and leisure activities. As such, the concept of accessibility to a CBD turns out to be a key feature for modelling population density distribution (Helsley & Strange, 2007). In Quigley (1985), accessibility is identified by the presence of infrastructural elements that guarantee individual mobility. In the case of an efficient transportation infrastructure, people may be less interested in being located near the CBD, since proximity to the CBD often entails the presence of congestion effects that, if significant, can urge people to consider settling slightly farther away. On this point, local authorities consider that investing in the construction of effective transportation infrastructure can encourage people's relocation across space and, in turn, reduce the impact of congestion-related costs.
In this sense, people's propensity to settle close to a CBD can be read as their desire to be near the focal point of their activities. By contrast, their propensity to move away from it can indirectly indicate the ease of reaching the CBD in terms of transportation costs.
This study focuses on the evaluation over time of the importance of accessibility to Boston in shaping Massachusetts' population density distribution once Boston has been identified as the CBD. Instead of focusing on changes affecting Boston's urban structure, this study more accurately concerns how Boston as a pole of attraction has shaped population density distribution in Massachusetts. 1 In this research, the spirit of traditional analysis of population density distribution for the specific case of Massachusetts is obtained, yet it also introduces two novelties with respect to the current literature. First, following Epifani and Nicolini (2013), the population density function is considered to be a random variable. Second, the analysis is performed by incorporating the time dimension.
As discussed in Nairn and O'Neil (1988), adopting a probabilistic density function is a generalization of the Modelling population density over time: how spatial distance matters 603 idea of population dispersion with respect to the distance from a centre of attraction (e.g., the CBD). The economic literature (Quigley, 1985;Durlauf, 2004;Topa & Zenou, 2015) identifies the physical distance from a selected site as a determinant for location choice, though such is not the only one. For instance, natural amenities, the quality of public services and/or the ethnic composition of the neighbourhood can matter as well. As such, working with a probabilistic distance function is a flexible device for combining these types of subjective priority. Previously, Epifani and Nicolini (2013) exploited a similar method of analysis to assess Boston's centrality in the location preferences of Bay Staters and to identify determinants of population density distribution in a probabilistic setting. By focusing on only the year 2000, these authors identified the physical distance from Boston as the relative dominant determinant of population density distribution, followed by the ethnic composition of the territory and a group of other covariates, including education, age composition and presence of natural amenities. As a result of this outcome, the present study aims to extend the framework of analysis by including the dimension of time.
This study focuses on the importance of the dimension of accessibilitynamely, the importance of distancein shaping the location decisions of citizens across time. To achieve its scope, an original database was built by exploiting a set of comparable data extracted from different editions of the US Census. Furthermore, in the tradition of the regional economic literature (e.g., Parr, 1985b;Nairn & O'Neill, 1988), the study takes as a reference a monocentric distribution function to model the distribution of population density in Massachusetts, where Boston is assumed to be a persistently attractive pole across time. This selection is underpinned by previous studies and empirical evidence.
As documented in Glaeser (2005), it is clear that people have desired to live in Boston since its foundation. Boston's comparative advantage lies principally in its human capital, with whom the city has been able to reinvent itself yet maintain its attractiveness despite having experienced cyclical waves.
Boston's attractiveness with respect to the rest of Massachusetts increased until 1920, when it began to decline during the mid-20th century, only to rebound in the 1980s. The reasons for the first period of growth are strongly associated with technological advances that prompted the subsequent urbanization of industrial activity via requirements imposed upon factories to reduce the space they occupied. Meanwhile, the implementation of a quality public transportation network made it possible to travel around Boston more cheaply than around other low-density communities. However, two principal causes of the decline in the city's attractiveness are rooted in the decline of manufacturing activity and in the general improvement of private transportation: the widespread use of automobiles to improve individual mobility, which favoured population displacement to the city's outskirts. Since the 1980s, Boston has regained its attractiveness, for the abundance of skilled labour in the population made the city highly attractive for skill-specialized activities. Consequently, very stringent urban housing regulations inflated the property (real estate) housing bubble. Glaeser and Ward (2009) provided a comprehensive interpretation of the causes and consequences of housing regulations in Boston in which the decline in the supply of new homes was not a consequence of a lack of land. In fact, the most effective measure was the definition of the minimum lot size, which in turn encouraged greater housing density. The authors showed that from a historical perspective, towns with more immigrants have had less stringent minimum lot-size regulations. In general, the practice of defining high minimum lot sizes was used by white natives to restrict home construction for blacks and foreigners.
To achieve the study's aims, its strategy of analysis exploits a Bayesian frailty estimation technique, which is a flexible framework that accommodates controlling for heterogeneity. Furthermore, the introduction of priors allows controlling for problems of weak data (LeSage & Pace, 2009). In particular, in the definition of the estimation strategy, the study relies upon the possibility of elucidating ad hoc priors in order to embed the dimension of time. Moreover, subjective priors are used to model individual expectations and create dependence across time in modelling population density distribution.
The study's sample selection is limited by the need to identify a sample consistently available over time that allows for comparison. Given the length of the study's time span, a list of 351 municipalities in Massachusetts has been the focus. These municipalities, mostly urban areas, are spread across all the state's territory. With this sample, the study tracks population density distribution across Massachusetts at the urban level from 1930 to 2010, plus two other isolated years, 1880 and 1890, which are considered as external benchmarks for discussing the results of estimations for potentially remote years.
Given the study's interest in analysing changes in the importance of accessibility (i.e., physical distance from Boston) in shaping people's location preferences, it is expected that the improved efficiency of the transportation network during the period monotonically reduces the relative impact of physical distance as a location determinant. Instead, the investigation reveals a bell-shaped dynamics. According to the study's estimations, the centripetal force of Boston as the principal location-attractor in Massachusetts was extremely strong at the end of the 19th century and continued to strengthen until the 1960s, after which it progressively declined. In this respect, the study's results align with those of Glaeser (2005). By contrast, ethnic composition acquired greater relative importance as a location determinant. The ethnic population compositionhere measured as the proportion of white people in the total local populationdoes not importantly impact modelling location choices until the 1970s. The estimated magnitude of its coefficient twists around zero, after which it immediately increases. The study therefore investigates the latter result from a geographical 604 Ilenia Epifani and Rosella Nicolini and historical perspective. In reference to the current literature, the study proposes some feasible interpretations. Empirical results also permit tracking the variation of some random spatial county effects as a further determinant in modelling uneven population density distribution across space. The study's findings confirm that the populations of towns belonging to remote counties far from Boston consolidated low attractiveness in relation to Boston as the CBD across time.
The paper is structured as follows. The second section describes the empirical strategy. The third section reports estimation outcomes, which the fourth section discusses. The fifth and sixth sections offer a discussion and conclusions, respectively. Details about data selection are discussed in Appendix A in the supplemental data online, whereas Appendices B-D present additional statistics and econometric estimations as well as robustness checks.

A HIERARCHICAL GAMMA MODEL
Population density distribution is modelled by referring to a monocentric distribution around a CBD: in this case, Boston. Economic literature has identified Boston as the pole of major attraction in Massachusetts since its foundation (e.g., Glaeser, 2005). Recently, Epifani and Nicolini (2013) have provided quantitative estimations in which Boston also emerges as the most attractive urban area in Massachusetts. The current study's initial working hypothesis is thus that all residents of Massachusetts consider Boston to be the state's centre of interest, and as their point of attraction they are willing to relocate to be near it. However, identifying residents' preferences in terms of location choices needs to be modelled by accounting simultaneously for other factors identified with proxies for natural amenities and ethnic composition. The empirical strategy thus involves associating in a probabilistic manner the population density distribution in Massachusetts with a group of selected variables representing factors describing people's location preferences.
This analysis implements a parametric model. The choice of likelihood for the municipality's population density is based on careful observations of the nature and properties of the data (see Appendix B in the supplemental data online). In particular, following Epifani and Nicolini (2013), this study uses a hierarchical gamma model. A gamma function has already been proposed as an appropriate tool for modelling population density distribution across regional space by Song (1996). However, an alternative to gamma could be represented by a lognormal model, since both gamma and lognormal likelihood are very popular in modelling positive variables and characterized by two parameters, as well as assume a standard deviation increasing with the mean. In fact, it is often difficult to know whether one should assume a gamma or a lognormal distribution. McCullagh and Nelder (1989) have suggested a gamma assumption for working with data on an original scale instead of on a log scale (cf. Parr, 1985a). Such is this study's case because, for instance, if the linear scale is preserved, then the sum of the population densities of the county's municipalities retains the appropriate mathematical definition of the county's population density. In addition, Firth (1988) has argued that a gamma model performs slightly better than the lognormal under reciprocal misspecifications. As such, the lack of covariates to include in the model makes the study's setting more sensitive given the misspecification problem when modelling population preferences. For this reason, a hierarchical gamma model may provide a suitable simple framework for conducting a study of the determinants of population density distribution in Massachusetts. 2 This study's Bayesian hierarchical gamma model with random county effects specifies: (a) a gamma likelihood for population densities describing the relationship among densities, as well as predictors and random county effects so that population density is gamma distributed with time-varying parameters each time; (b) a gamma distribution for the random county effects; (c) a prior distribution for the unknown parameters in the above two parts.
The discussion on the prior distribution in part (c) is incorporated in the third section which concerns the Bayesian estimation. Instead, introducing some preliminary definitions it is necessary to give a formal statement of the parts (a) and (b) of the model. Considering the whole territory of Massachusetts, the population density and its determinants can be defined as follows. For the jth municipality within the ith county, let ij be its density of population at decade t; Dist ij is its spatial distance from Boston; 3 and Mix (t) ij its proportion of white people at decade t. At the same time, Z i signifies the size of the local amenities measured as the proportion of water areas in the ith county territory; 4 whereas w (t) i represents the random effect associated with county i at time t; and n i denotes the number of municipalities in county i, for i ¼ 1, … , 14. The decades involved in the analysis are t ¼ 1880, 1890, 1930, … , 2010. Lastly, the notation X Γ(a, b) means that a random variable X is gamma distributed with shape a and rate b. 5 Equation (1) formally defines the first part (a) of the model, i.e., the conditional distribution of the Y (t) ij 's given the random effects w (t) i and the part (b), i.e., the law of w (t) i : The predictor Dist ij captures the impact of the distance from Boston as a factor driving the distribution of population density. According to the hypothesis underpinning equation (1), Dist ij embeds the attractiveness of the capital and, moving away from Boston, lower values of population density should be noted. By contrast, Z i controls for the geographic characteristics of a territory i, while Mix (t) ij controls for the ethnic composition at municipality level. It should be noted that estimations are run separately by year and therefore use cross-sectional data. As such, municipality-level fixed effects cannot be incorporated to capture municipality-level heterogeneity. The Mix (t) ij covariate plays an important role in partially controlling heterogeneity across municipalities and embeds the attractiveness of the municipality for a specific ethnic group (namely whites), which can derive from different reasons, including specific policies implemented at local level (e.g., zoning laws). The sign of this variable is expected to be negative when the white population aims at settling far from the CBD to live in individual dwellings and, for instance, enjoy good accessibility for reaching the CBD. The econometric equation (1) also includes an interaction term Dist × Z between the distance from Boston and the natural amenities, as well as an interaction Dist × Mix (t) between the distance from Boston and the ethnic composition. While the former term (Dist × Z) aims to control for the trade-off between the preference for high concentration of natural amenitiesusually enjoyed far from Bostonand the desire to be near Boston, the latter (Dist × Mix (t) ) controls for the interplay between the ethnic composition of a municipality and, again, the distance from Boston. As such, the extent to which a measure of the ethnic composition of a territoryhere, the density of whites at the municipality levelis more or less exacerbated in location decisions by the distance from Boston. The interaction term Dist × Mix (t) is expected to impact population location decisions when the quality of the infrastructure allows for comfortable commutes and makes the variable Dist (i.e., transport costs) less dominant. To assess the importance of Dist and Mix (t) , it is first assumed that b (t) 5 ¼ 0, and later b (t) 5 = 0. Finally, an unobserved frailty w (t) i for each county i and decade t is also incorporated in equation (1) to capture (1) the degree of similarity of population habits (or similar regulation in civil matters) for people living in municipalities belonging to the same county as well as (2) the degree of heterogeneity among the counties. In one way or another, each county frailty w (t) i summarizes all the predictors of the population density both unobservable and observable, but not explicitly taken or accounted for by the remaining covariates in the ith county at time t. As such, the county frailties can be assimilated to a sort of random spatial effects. Similar to Epifani and Nicolini (2013), this analysis assumes stochastic independency between frailties w 1 ( t) , … , w 14 ( t) , given the independence of each county land organizational structure. At the same time, there is dependency in each county, meaning that municipalities belonging to the same county share some common features that in the model used here are associated with the fixed effects embedded in Z's and the random effects Since the conditional expectation of Y (t) ij given the county effect w (t) i is: i then a gamma frailty w (t) i significantly smaller than 1 amplifies the impact of the predictors on the population density distribution in county i at time t; whereas a gamma frailty w (t) i far greater than 1 collapses them and, in a very extreme situation, w (t) i ≈ 1 excludes the existence of territorial differentiation.
The values of w 1 (t) , … , w 14 (t) are probabilistically controlled by a single time-variant parameter α (t) in such a way that the mean of every w (t) i equals unity while its variance is 1/α (t) . This dynamics implies that for large values of α (t) all w 1 ( t) , … , w 14 ( t) are a priori concentrated around 1, resulting in a condition of lack of territorial differentiation, save some possible morphological differences between county areas recorded in the natural amenities Z's. Instead, small values of α (t) cause a skewed distribution of the w (t) i 's that permits them both large and small values, and then represent a strong heterogeneity among countries at time t. Not least, α (t) is dimensionless and has the direct meaning of variability measured by the coefficient of variation. This probabilistic modelization of the frailties is similar to the prior distribution used, for example, by Geweke (1993), to model heteroskedasticity (and outliers) in Bayesian linear regression, and by LeSage and Pace (2009) in the context of Bayesian heteroskedastic spatial models. According to the latter model, all observations are split into n groups according to n points in space where they have been collected and their variances may vary across space. The prior distribution for the unknown variances takes the form of a set of n identically and independently distributed (iid) inverse chi-squared χ 2 (r)/r distributions in which r represents the single parameter of the χ 2 (r) distribution.
The parameters α (t) along with θ (t) also measure the intensity of the relationship among population densities of different municipalities within the same county. It can be proven that the linear correlation coefficient between two of them is given by: , for all t, i and j = h as shown in Epifani and Nicolini (2013). In particular, for large values of α (t) , the correlation of the municipality population densities within the same county approaches zero. By contrast, small values of α (t) represent a strong positive relationship of densities among municipalities in the same county. Furthermore, for given values of α (t) , the larger θ (t) is, the larger the dependence between the population densities of the municipalities within the same county.
The parameters α (t) and θ (t) also control the variability of the density population. Indeed: , for a (t) . 2 As such, the larger are α (t) and θ (t) , the smaller the 'marginal' variances Var(Y (t) ij ) of the population densities and then the more concentrated the Y (t) ij . As concerns regression parameters, coefficients b (t) 3 associated with the distance are expected to be negative for all t, as is commonplace in the literature. Similarly, the β 4 ( t) associated with Mix (t) will be negative whenever the ethnic composition of a territory is a true discriminating factor in location choices (as previously argued, as in Quigley, 1985).
The unknown parameters of the equation (1) Estimations are performed in three steps beginning by running regression including only distance and frailties for each year in the sample and assuming that β 0 (1). The purpose of this first exercise is to track the evolution of the distance from Boston as the unique determinant of population density distribution. This model is labelled the Baseline model (BM). Thus, considering an alternative reduced version of equation (1), without the interaction Dist × Mix (t) (namely β 5 ( t) ¼ 0) a model is produced, labelled the Model with Ethnic Composition (MEC). Ultimately, estimations for the complete model described in equation (1) from now on called Model with Full Interactions (MFI)are run. In this way, the significance and variation of the coefficients associated with the two expected principal covariates across time (i.e., distance from Boston and ethnic composition of the population) are first tracked by progressively including more controls. Secondly, the evolutionary trend of the coefficient of distance in the three alternative reductions of the hierarchical model in equation (1) are compared to capture variation of the magnitude of coefficients of some selected covariates across time. In this sense, the analysis seeks to assess the magnitude of changes for the estimated coefficients associated with accessibility and neighbourhood characteristics that reflect changes in their relative importance in influencing location choices.
All the models were run from 1930 to 2010, but the lack of data on the ethnic composition restricted the estimations for 1950 to the BM only. To formulate some suggestions about the importance of the distance in the location choices at the beginning of census data availability, the BM was also run for the years 1880 and 1890. In this respect the statistics obtained referring to these two years are not fully comparable with those from 1930 onward.
Nevertheless, estimations of the most remote years may partly indicate how dynamics of the distribution of population density over time can be interpreted.

BAYESIAN ESTIMATIONS
This study's empirical analysis occurs within a Bayesian framework. Two motivations especially drive this choice. First, the Bayesian paradigmin which all that is unknown is probabilistically modelled as randomnessprovides a more natural framework for analysing hierarchical models with rich parameterization and random spatial effects. In the language of the Bayesian inference, the county frailties w 1 ( t) , … , w 14 ( t) are simply unknown parameters with an assigned prior probabilistic law as presented in the second line of equation (1) that takes the form of a set of 14 independent Γ(α (t) , α (t) ) distributions, for each t. Hence, one can handle them as other unknown parameters of the model, β 0 ( t) , β 1 ( t) , … , β 5 ( t) , α (t) and θ (t) . In this way, controlling for heterogeneity between counties along with the term Z i becomes possible, which is of crucial concern, especially in replicating the same framework of analysis for different moments in time.
Second, a limited number of data (351 for every time t) with respect to the number of parameters to be estimated (eight parameters and 14 county effects for every t) is disposed. However, the Bayesian approach, which combines information from priors and the sample, still works; in the context of hierarchical models, a Bayesian statistician can even estimate more parameters than observations, though such is invalid in frequentist inference. 6 According to a Bayesian paradigm, the unknown par- , for t ¼ 1930, … , 2010, are random variables from a prior joint distribution π(·), which one has to specify, and the statistical problem lies in updating π(·) by computing the posterior joint conditional probability π(·| Data) of the parameters, given the collection of Data ¼ (Y; Dist; Mix; … , 14 and t ¼ 1930, 1940, … , 2010}. The posterior distribution is obtained via the Bayes' rule as: where L is the likelihood function. Then the posterior joint distribution is summarized in a simple waytypically by posterior means or mediansgiving rise to point estimates of the unknown parameters. Moreover, the precision of the estimation of the unknown parameters is summarized by means of the associated posterior standard errors and some Bayesian credible intervals. 7 Neither the joint nor the marginal posterior distributions of the unknown parameters usually have a closed form, as they are required to be solved some integrals that do not admit any analytical solution. However, Markov chain Monte Carlo (MCMC) algorithms can be used to simulate them. A GIBBS sampling algorithm was coded in the JAGS (Just Another Gibbs Sampler) software package by Plummer (2013), which is designed to work closely with the R package. All statistical computations and graphics were performed in the R package (R Core Team, 2014).
Prior specifications for b (t), i s, u (t) and a (t) In order to embed the time dimension, a novelty in the manner in which to deal with the priors of b (t) i 's, θ (t) and α (t) is introduced by adopting two different strategies that lead to diffuse and historical strategies. The former strategy was carried out by assuming that all the regression 5 are a priori independent normal random variables with zero mean and large variance 10,000: N (0, 10,000) iid 1930, 1940, … , 2010. Instead all shapes α (t) and the rates θ (t) , for t ¼ 1930, 1940, … , 2010, are a priori independent exponential random variables with a rate equal to 0.2. Moreover, all α (t) and θ (t) are independent of each b (t) i . Such a choice for α (t) and θ (t) made them concentrate around 5, but with a large variance equal to 25. Because of the large variance, these priors are as vague as possible, so that the MCMC algorithm widely explores the parametric space, yet still allows the MCMC algorithm to converge well with this study's data and model. This strategy is hereafter referred to as the diffuse prior.
By contrast, the latter strategy produces a light time dependence between parameter estimates. To explain, the independence of the parameters and the form of the diffuse prior densities are both preserved, yet now at each decade t, whereas the values of the prior hyper-parameters are taken according to the posterior estimates given the previous decade. For example, to estimate the parameters with data from the 1940 Census, the regression parameters b , whereas α (1940) and θ (1940) are a priori independent exponentially with means equal to the posterior means of α (1930) and θ (1930) respectively. This second strategy is referred to as the historical prior. Put differently, the historical prior is quite comparable with the canonical way in which to deal with the classical idea of adaptive expectations. 8 Though the BM and MEC have been run under both diffuse and historical priors, the MFI has been estimated using only the diffuse one. At any rate, looking at the results obtained under diffuse priors, one can appreciate that the MFI confirms the outcomes obtained in the MEC as for the parameters of interest. Given this evidence, the estimations of the MFI can be thought as a sort of exercise of robustness check.
Comparing the estimation outcomes from the two alternative strategies allows testing of the potential effect of unexpected variations in the conditions underlying location choices. In the case that the two estimates obtained under the two kinds of priors turn out to be almost identical, such similarly would mean that no important events affected citizens' belief conditions for location choices. If the two types of estimates differ, then the difference would imply that citizens adopted a sort of adaptive behaviour in their decision-making process as a consequence of an important event that affected their decision making from a temporal perspective.

ESTIMATION RESULTS
To manage tractable values, the predictors Dist, Mix (t) and Z were standardized by subtracting their sample mean and dividing by their sample standard deviation.
On the whole, 750,000 iterations for three chains were run for estimating the unknown parameters in the three models BM or MEC or MFI for each decade t, and the first 250,000 were discarded as burn-in. After the burnin, one out of every 100 simulated values was kept for posterior analysis, for a total of 5000 simulations saved per chain for each time t. A final sample was selected among these three. The convergence diagnostics, such as those available in the R package CODA (Gelman, Geweke, Heidelberger and Welch stationarity test, interval half-width test) were computed for all parameters, indicating that convergence was achieved.
The first concern has been to test the goodness of fit of the empirical specifications. The best matching of the distribution of population density was selected among BM, MEC and MFI according to the Bayesian deviance information criterion (DIC), 9 and the percentage of 'Bayesian outliers': 10 the model yielding the smallest value of DIC and/or smallest value of the percentage of outliers was chosen. According to the DIC criterion, MEC clearly dominates BM from 1980 onward, while MFI dominates the two others from 1990 onward (see Table B1 in Appendix B in the supplemental data online).
Referring to the literature, these results confirm that Boston was able to regenerate because of its capacity to attract people in the last decades and that the ethnic dimension turns out to be a key factor in explaining the distribution of population density (Glaeser, 2005).
As far as the estimates, Tables 1(a) and (b) present some posterior summaries of b (t) 0 , b (t) 1 , … , b (t) 5 , α (t) and θ (t) , for BM and MEC, under the diffuse priors. 11 Meanwhile, Table 2 reports the results of the estimations for MFI under the diffuse prior.
The temporal evolution of the regression coefficients is depicted in Figure 1(a) in the case of diffuse priors and in Figure 1(b) in the case of historical priors.
From Tables 1 and 2 it can first be deduced that, when statistically significant, all estimated regression coefficients display the expected sign. These estimations were run by including random frailties that allow accounting for omitted variables that cannot be dealt with explicitly.
Results show that the distance from Boston is estimated to be negatively associated with the size of population density, as well as the Mix variable. Proximity to Boston and ethnic composition at the municipality level therefore matter in residential location decisions in the sample. In   general, white people seem to appreciate locating in less dense areas and to show a prominent preference for living in individual dwelling properties. The presence of natural amenities does not play any role in residential location decisions in the case of diffuse priors, though under historical priors it is estimated to have become significant and with a negative sign in the most recent decades, meaning that there is not really an interesting asset for location choices. At the same time, the interaction term Dist × Z appears to be significant only in the case of historical priors, while Dist × Mix (t) is statistically significant only for the most recent decades, namely since 1990 (Table 2). When statistically significant, the negative sign of the interaction Dist × Mix (t) reinforces the idea of the importance of the ethnic composition of the municipality in residential location choices and thus in evening out the relevance of the distance from Boston. Concerning the temporal evolution of the distance effect, Figure 1(a) and Table 1(a) reveal that in the case of diffuse priors, the trend of the distance coefficient b (t) 3 is not monotonic over time. The magnitude of b (t) 3 records varies up to 50% throughout the period. The centripetal forces of Boston as Massachusetts' CBD were extremely strong at the end of 19th century, and their reinforcement continued up to the 1960s. However, the coefficient of the population composition index (i.e., the Mix covariate) did not have an important impact before the 1970s; its magnitude twists around zero as a statistically significant coefficient and becomes negative with a monotonically decreasing slope from 1970 onward. In the 1970s, Boston was no longer a strong attractive CBD for white people, who then preferred to settle far from it, though it was attractive to them until the 1950s. The evolution of the significance of the distance in MEC is especially interesting. The trend of the coefficient of the covariate distance in MEC follows an inverted 'U'-shaped curve from 1930 onward. Until 1970, there is no remarkable difference between the estimates of the distance coefficients in the BM and MEC. Yet, if the two estimations of the distance effect under diffuse priors Figure 1(a) and under historical priors in Figure 1(b) are compared, it is clear that the trend is more or less identical. More precisely, it is smoother in the case of historical priors than in the case of diffuse ones, 12 and the magnitude of the coefficient is larger in the case of historical priors than in the case of diffuse ones.
However, as soon as the Mix covariate becomes significant, the two estimates of b (t) 3 under the BM and under MEC diverge. In the most recent years, the value of the distance effect in BM is basically constant, though it decreases impressively in the MEC. This result suggests an interesting interpretation; there is an important expectation component associated with the spatial distribution of ethnic groups that impacts residential location choices in the most recent decades when this effect is more relevant.
The spatial dimension turns to be a key factor; the estimation of the county frailties reinforce the interpretation of the previous results. The frailty terms play an important role, for they correct the magnitude of the decrease in population density while moving farther from Boston. An extensive discussion on the random frailties is presented in Appendix C in the supplemental data online.
Focusing on α (t) , it is observed that under the diffuse prior the α (t) are all concentrated around 5 (with a large variance equal to 25). Instead, the posterior estimates of α (t) in Table 1 (and also in Table B3 in Appendix B in the supplemental data online) show that a posteriori they become reduced and far smaller posterior values are obtained around 1970, under both the BM and the MEC. This result means that until 1970, under both diffuse and historical priors and for both the BM and MEC, the dependence among observations via the county effect reinforces the heterogeneity among counties as well. By contrast, after 1980 the posterior means of α (t) become strongly dependent on the specification of both priors and the model. 13 A reasoned composition of the different pieces of these estimations paints an interesting picture. 14 Since this paper is working with a monocentric distribution setting, it is important to understand how the changes affecting the centre of the distributionhere, changes in Boston at the economic, social and/or urban levelsimpact the shape of the distribution (i.e., spatial population distribution across Massachusetts). For one, the attractiveness of Boston as a CBD and Massachusetts' consequent population density distribution are not driven simply by the ease of access to Boston premises. As for the quality of the transportation infrastructure, the Boston area and Massachusetts in general are often considered by analysts as territories with substantial congestion problems. According to data published by IHS Global Inside (2012), congestion costs per auto-commuter in the Boston area have greatly increased since 1990. In 2010 US dollars, these costs were estimated at roughly US$465 in 1990 (in the middle of the US rankings), rose to US$938 in 2000, and since hit US$980 in 2010, making the city the ninth most expensive in the United States in terms of congestion costs. Plus, the quality of the transportation infrastructure is relatively poor, despite notable investment over time; in 2006, estimates of state-level expenditure on public investment (as a percentage of gross domestic product (GDP)) assessed the state value at 1.6% ranking Massachusetts 45th in the United States (Heintz, Pollin, & Garret-Peltier, 2009). According to Heintz et al. (2009), Massachusetts also ranked eighth, 20th, fifth, and first in the United States on this count in the periods 1966-75, 1976-85, 1986-95 and 1996-2006, respectively. In the light of this evidence, it is possible to consider that in the last decades the reduced importance of distance from Boston in shaping Massachusetts' population density distribution, as represented by a drop in the estimated coefficient of the spatial distance covariate, is not uniquely driven by improved transportation infrastructure.
Furthermore, people generally appreciate the existing transportation facilities yet they are greatly concerned about other factors that qualify the quality of life in the neighbourhood in which they live (e.g., Durlauf, 2004). The economically oriented literature covers this issue quite extensively, and a major explanation for the phenomenon (i.e., the real estate market) may be that property in Boston and its surroundings are extremely expensive, which prompts people to prefer settling elsewhere. Gerardi, Rosen, and Willen (2010) present some interesting figures: the secondary market turns out to have been important since the 1980s (40% of new originations) before peaking at 73% in 2005. Until the 1960s, savings were the most importance income source for investment in properties. In the 1960s, inflation and interest rates rose, which drove up the cost of funding for savings and loans. Again, between 1995 and 2005, the Boston area recorded another rise in housing prices, with a 15% pick-growth in the year 2000. 15 Additionally, given the difficulty of accessing property then, as a CBD Boston could have a greater concentration of immigrants who may prefer to or have no other option than to rent. This difficulty de facto favours the creation of an ethnic ghetto that reinforces the tendency of the white population to settle far from the CBD. Along these lines, Boustan and Margo (2013) have documented that from 1940 to 1980 black home ownership in US urban centres increased. This tendency was supported by an increase in white home ownership in suburban areas, where the fall in real estate prices improved the owneroccupancy of residential properties.

DISCUSSION
This study's estimates show a decline in the importance of the covariate distance from CBD in citizens' residential location choices over time, while the ethnic component has become increasingly important. The interpretation of these results must be associated with an important structural transformation at the territorial level that occurred in the United States throughout the period studied here.
Clearly, racial discrimination in the US housing market has been constant over time. Cutler, Glaeser, and Vigdor (1999)

document interesting trends:
. During 1890-1940, black people migrated from the rural south to the urban north. In this respect, cities developed with entirely black housing. . During 1940-70, the migration of black people expanded across the United States. . During the 1950s, some white people took action to exclude black people from their residential areas. . Since 1970, there has been a reduction in racial segregation in residential areas, and black people have moved into once all-white areas.
Ross and Turner (2005) detect a persistent discriminatory component in the housing market against African-American and Latino families. These trends induced distortions, possibly to the effect that real estate agencies might have refused to show units in predominately white neighbourhoods to minorities in order to avoid offending future clients or lowering market activity in key markets. Ethnicity is thus used to signify some unobservable factors of economic returns from market interactions.
As for the mortgage market, there exist several contributions that this study's results support with new, structured quantitative evidence. Referring to the entire United States, Collins and Margo (2011) focus on the rate of home ownership for black and white households in the United States from 1870 to 2007. The narrowing of the racial gap declined by 25% from 1870 to 1910, and racial convergence was reinforced until the 1940s. The current study's results align with this evidence. In the presence of a poorly organized transportation system and with manufacturing jobs concentrated in urban areas, workers have preferred to live near their place of employment. Indeed, most have been renters. As the transportation system improved, workers gained mobility.
However, the mobility effect must be combined with limitations imposed by the mortgage market. In the 1930s and 1940s, reforms in the mortgage market pushed the consolidation of the practice of redlining black neighbourhoods and made it more difficult for black people to obtain mortgages. Land was cheaper on the urban periphery, and suburban housing was mostly composed of single-family, owner-occupied homes. Since black people generally lacked the financial resources to move there, residential segregation continued and peaked around 1970, though some rich African-Americans with high levels of education and income were able to achieve home ownership after 1940. These trends are collectively the most suitable candidate to act as a crucial factor of subjective preferences and individual adaptive expectations that this study's estimations identify as being extremely important since 1970. In 1968, the US Congress moved to create a secondary market for mortgages. In particular, race-and genderbased discrimination in the mortgage market emerged as a key issue in the early 1970s, which caused the passage of the Equal Credit Opportunity Act. The type of situation experienced in those years replicated the one studied by Kollman and Fishback (2011), who named mortgage practices in the 1920s and 1930s as being responsible for the spatial distribution of blacks and whites across the territory.
In this context, lower land prices far from CBDs have favoured the construction of larger, single-family detached dwellings (Case, 1986). As Boustan and Margo (2013) discuss in relation to the rest of the United States, Massachusetts experienced strong white suburbanization during the post-war period through increased home ownership rates for black people who remained in urban centres.

CONCLUSIONS
This study has proposed a Bayesian approach to track the importance of distance from Boston in shaping Massachusetts' population density distribution. By studying census data across more than 80 years, it is possible to identify an interesting evolutionary pattern. Distance is a key element in determining location preferences, though its impact on the distribution function does not decrease in tandem with the improvement of transportation infrastructure, namely across time, given the regular investment plans put in place by the state. Instead, the coefficient associated with distance reveals an inverted 'U'-shape whose magnitude varies according to changes in the importance of an area's racial composition among citizens' preferences for residential areas. Put differently, ease of access to Boston has not always been the dominant component of subjective preferences; an institutional setting may help to reinforce the ethnic dimension in the composition of citizens' residential preferences.
The idea of adopting a measure of subjective distance instead of a simple spatial distance variable underscores important empirical figures that may provide interesting suggestions for policy. Regional planning targeting the improvement of infrastructural quality to boost accessibility to CBDs and shape population density distribution may be totally ineffective, or at least lose effectiveness, if the location choices are mostly driven by other factors such as ethnic preferences. Instead, public authorities should consider the potential consequences of segregation patterns that an improved transportation system may induce. Of course, these results would be more precise if a longer data-series at the urban level were available. The technique exploited in this study would certainly allow for the identification and exploration of the creation of urban ghettos and for tracking their evolution over time.
Lastly, this study has assumed that population densities are independent over time. The only relationship between them is produced by the historical prior in only a slight way. Actually, a Bayesian dynamic space-time gamma model with dynamic gamma spatial frailties and the vector of regression coefficients modelled a priori as a vector random walk is in progress.

DISCLOSURE STATEMENT
No potential conflict of interest was reported by the authors.

SUPPLEMENTAL DATA
Supplemental data for this article can be accessed at http:// 10. 1080/00343404.2015.1110237 ORCiDs Ilenia Epifani http://orcid.org/0000-0001-9005-1378 Rosella Nicolini http://orcid.org/0000-0002-3331-8926 NOTES 1. Boston could be also a pole of attraction for other neighbouring areas close to Massachusetts's borders. However, this study's analysis privileges the historical dimension and avoids any change at the institutional level (e.g., governance issues or differences in civil legislation across US states) that might distort the identification strategy.
Focusing only on Massachusetts, it is controlled for, while the Bayesian approach also deliver insights that can be of interest for Massachusetts's neighbouring areas. This issue is discussed in the fourth section. 2. As pointed out by a referee, more flexibility may be obtained in a non-parametric way by, for instance, a mixture of distributions. Nevertheless, this kind of specification is analytically much more demanding. 3. This study is privileged to focus on this measure of distance rather than on other measures such as travel time, for instance. The choice is driven by two orders of reasons. First, the authors do not dispose of available date data about the travel time for so long a period of time. Secondly, year-by-year estimations are being performed. In order to exploit the variability of the travel time (and, hence, the information delivered by this variable), it is necessary to deal with panel-style data. 4. Glaeser and Ward (2009) argue that water is an important amenity for creating recreational spaces and important laws have been passed to protect waterways and wetlands. For more information about water areas in Massachusetts, see Simcox (1992). 5. The gamma-probability density Γ(a, b), with shape a and rate b, has kernel x a−1 e −bx on the half-line (0, ∞), and it is equal to zero otherwise. The expected value and variance of a random variable X Γ(a, b) are given by E (X ) ¼ a/b and Var(X ) ¼ a/b 2 (i.e., the ratio between the standard deviation and the mean is Cv(X ) ¼ 1/a). 6. For instance, see http://www.bayesian-inference.com/ samplesize/. Certainly, a Bayesian hierarchical model overcomes the difficulty of dealing with the problem of over-parameterization, but it requires caution in the definition of the priors in order to make them truly representative and also to preserve the sensitivity of the model to capture changes due to the variations of the parameters over time. Furthermore, the authors are conscious that a small sample size makes difficult a singling out of a possible discrepancy between the prior and sample information because the empirical information is diluted on too large a parametric space. 7. In Bayesian statistics, a γ 100% credible interval for an unknown quantity η is given by: where q (1-γ)/2 and q (1+γ)/2 are posterior quantiles of η. 8. The historical strategy includes only marginal information derived from the posterior distribution given the data of the previous decade. Including information from the past about the dependence structure in the prior distribution when non-normally distributed parameters are involved is a technical issue that is difficult to resolve. The authors leave it for further investigation. 9. The DIC is a generalization of Akaike's information criterion (AIC). It is given by the deviance (i.e., minus twice the likelihood) calculated in the posterior means of all parameters plus twice the effective numbers of parameters pD. Models with a small DIC are preferred over those with a large DIC. 10. Town j in county i has been classified as an outlier at time t if its real population density Y (t) ij does not fall within the 95% posterior credible interval of the population density Y (t) ij given Data ¼ Y; Dist; Mix; Z (see Appendix D in the supplemental data online for more details on the Bayesian outliers and the 95% posterior credible intervals of the population densities). 11. Estimates for historical priors can be found in Table  B3 in Appendix B in the supplemental data online. Appendix B also has additional statistics. 12. From a statistical viewpoint, this circumstance stems from the low variance of all historical priors, given their dependence on the efficient and concentrated estimates of the previous period. 13. Given the computational technique, in the case of historical priors, the estimates for α (t) should not be considered fully reliable. The algorithm performs poorly in updating the posterior values of α (t) , probably since these posteriors becomes increasingly more concentrated around a few values from 1940 onwards and then the algorithm may get struck in the initial values. 14. For a complete discussion about the credible intervals for the estimations of the population density, see Appendix D in the supplemental data online. 15. More detailed data can be found at http://www. forecast-chart.com/.