Contents lists available at ScienceDirect

Physica A

journal homepage: www.elsevier.com/locate/physa

Size distribution of U.S. lower tail cities

Stephen Devadossa, Jeff Luckstead b-*

a University of Idaho, United States b University of Arkansas, United States

highlights

• A large number of small cities are in the lower tail.

• Application of reverse Pareto and reverse general Pareto to U.S. lower-tail cities.

• U.S. lower-tail cities follow reverse-Pareto distribution.

CrossMaik

article info

abstract

Article history:

Received 13 May 2015

Received in revised form 9 September 2015

Available online 19 October 2015

Keywords: Lower tail cities Reverse Pareto Size distribution United States

Studies that analyzed the size distribution of U.S. cities have mainly focused on the upper tail and showed that these cities adhere to Zipf's law. However, even though a large number of cities are in the lower tail, very few studies have examined the distribution of these small cities because of data limitations. We apply reverse Pareto and reverse general Pareto distributions to analyze U.S. lower tail cities. Our results show the power law behavior of lower tail U.S. cities is accurately represented by both the reverse Pareto and general Pareto.

© 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction

Studies that analyzed the size distribution of U.S. cities have mainly focused on the upper tail (135 largest metropolitan areas) and showed that these cities adhere to Zipf's law (see for example Refs. [ 1-4]).1'2 Almost all past studies have analyzed the distribution of large cities because of the high concentration of population in these cities. However, even though a large number of cities are in the lower tail, very few studies have examined the distribution of these small cities because of data limitations. But, starting in 2000, the U.S. Census has provided a substantially expanded data set that includes all locations called ''places'' which cover all cities, towns, and villages. This new and expanded data has paved the way for several studies to analyze the size distribution of all U.S. cities [9-11]. However, these studies clearly identified the difficulty in examining both the upper and lower tail simultaneously, particularly using the standard rank-size plots on a log-log scale due to heavy distortion of upper tail cities for descending rank and lower tail cities for ascending rank. Reed [12] emphasized the need to analyze these tails separately because lower tail cities follow reverse Pareto (defined below), whereas upper tail cities exhibit Pareto. Then, Reed applied reverse Pareto to study lower tail cities in California and West Virginia in the United States and Cantabria and Barcelona in Spain. A recent study by Devadoss et al. [13] found that lower tail cities in India do exhibit

* Corresponding author.

E-mail addresses: devadoss@uidaho.edu (S. Devadoss), jluckste@uark.edu (J. Luckstead).

1 Studies have also shown that the Zipf exponent depends on the sample size [5,6]. Furthermore, Rozenfeld et al. [7] presented evidence that Zipfs law is supported by geographic rather than administrative boundaries.

2 Stanley et al. [8] have also analyzed firm size distributions.

http://dx.doi.org/10.1016/j.physa.2015.09.077

0378-4371/© 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/ licenses/by-nc-nd/4.0/).

power law behavior.3,4 Because a large portion of cities are in the lower tail and these cities have received scant attention in the literature, the purpose of this paper is to analyze the size distribution of U.S. lower tail cities.

2. Methodology

The size of lower tail cities, x1 < x2 <,..., < xn-1 < xn with ranking 1 to n, can be represented by the reverse Pareto distribution, with PDF and CDF

xa—1

g (x|a, a) = a—— (1)

G (x|a,a) = , (2)

where a = max (x) is the location parameter and a is the shape parameter.

We apply the maximum likelihood method to estimate a. For a sample of n independently and identically distributed (i.i.d.) lower tail cities, the log-likelihood function is

n— 1

L (a, ..., xn— 1) = (n — 1) log (a) — a (n — 1) log (a) + (a — 1) ^ log (xO . (3)

Maximize the above function with respect to a and solve the first-order condition to obtain

- _(n — 1)_

n— 1

(n — 1) log (a) — £ log (xi)

To predict city sizes (Sp), we substitute a into the reverse Pareto CDF Eq. (2) and solve for xp = aG (-)1/a. The log of actual and predicted values of x can be plotted against the log rank to obtain the rank-size plot.

We also estimate the lower tail distribution using the more flexible reverse-general Pareto (GP).5 To our knowledge, our study is the first to apply reverse-general Pareto to estimate lower tail size distribution. The PDF and CDF for GP are

p ( x — a \p—1

f (x|p,a,a) = p (l + (5)

fx — a + a\p

F (x|p,a,a) = (—^—j , (6)

where a = max (x) is the location parameter, a is the scale parameter, and p is the shape parameter. With a = o, reverse general Pareto becomes reverse Pareto; thus the former nests the latter.

For n i.i.d. samples of lower tail cities, the log-likelihood function of reverse general Pareto is

L (P, a, a |*1.....xn—1) = (n — 1) log (P) — (n — 1) log (a) + (P — 1) £ log + ^^^ .

Since the maximization of this function does not yield closed form solutions for p and a, we use nonlinear optimization to obtain p and 0. Based on the restriction p = a and a = a = max (x), we can test whether the reverse general Pareto nests reverse Pareto using the likelihood ratio test by calculating the values of the log-likelihood function for the unrestricted (Lu) and restricted (Lr) models and using the likelihood ratio (LR) test

LR = 2 (log Lu — log Lr) - x2 (2).

Substituting p and a into the reverse general Pareto CDF (6), we can solve for the predicted values: xgp = a F (■)1/^+fi,—cr. We can graph the log of actual and predicted city sizes against the log of the rank to generate the rank-size plot.

We employ rank-size plots and the Kolmogorov-Smirnov (KS) test to analyze the goodness of fit for both the reverse Pareto and general Pareto distributions for U.S. small cities.

Luckstead and Devadoss [14] studied the growth process of Indian large cities and concluded that Gibrat's law holds for these cities.

4 See Devadoss and Luckstead [15] forthe growth process of lowertail cities in the United States.

5 Urzua [5] and Luckstead and Devadoss [16] have applied general Pareto to upper tail city distributions.

Log of City Size, 2000 Log of City Size, 2010

Fig. 1. Histogram of log of city sizes.

Table 1

Lower tail estimation, 2000.

Truncation points (log () Sample size Reverse Pareto Reverse general Pareto 5% Critical values for KS

a KS ß CT KS

log (85) = = 4.44 1026 1.483 0.028 1.451 84.368 0.032 0.043

(0.047) (0.050) (0.387)

log (95) = = 4.55 1229 1.493 0.028 1.467 94.405 0.031 0.039

(0.043) (0.047) (0.413)

log(105) = 4.65 1444 1.513 0.029 1.492 104.479 0.031 0.036

(0.040) (0.044) (0.465)

log (115) = 4.74 1613 1.486 0.021 1.463 114.388 0.021 0.034

(0.037) (0.040) (0.393)

log (125) = 4.83 1791 1.461 0.031 1.438 124.323 0.029 0.032

(0.035) (0.036) (0.339)

log (135) = 4.91 1980 1.444 0.033 1.422 134.286 0.031 0.031

(0.033) (0.034) (0.307)

log(145) = 4.98 2189 1.430 0.029 1.409 144.259 0.026 0.029

(0.031) (0.032) (0.283)

Standard errors are in parenthesis.

3. Empirical analysis and results

Population data for all U.S. cities for the census years 2000 and 2010 were collected from the U.S. Census Bureau [17]. For the analysis, we use the census definition of a city unit called ''census designated places'' for our analysis.6 We define the truncation point for lower tail cities to be approximately equal to the inflection point of the histogram of log city sizes, which is about log (115) = 4.74 (see Fig. 1). For this truncation point and for the year 2000 (2010) the sample size is 1613 (2502) with mean city size of 68.83 (65.83) and standard deviation of 29.91 (30.82). We also consider different truncation points, ranging from log (85) = 4.44 to log(145) = 4.98, to provide robust estimates of the parameters for various sample sizes.

Table 1 presents 2000 Census year estimates for reverse Pareto and general Pareto along with their standard errors in parentheses and KS statistics with corresponding critical values at the 5% level. For the Census year 2000, the reverse Pareto estimates of a range from 1.430 to 1.513 for different sample sizes. These estimates are highly significant as evident from the small values of the standard errors. The predicted city sizes based on the reverse Pareto distribution replicates the actual city sizes as observed from the rank-size plot7 (Fig. 2, first panel) and also from the KS statistics (Table 1) which are generally less than the critical values.

The estimates of the reverse general Pareto shape parameter // range from 1.409 to 1.492 and the estimates for the scale parameter o range from 84.368 to 144.259 and are very close to ( as can be seen from Table 1. Both /3 and o estimates are highly significant based on the small value of the standard errors. The reverse general Pareto also predicts the lower tail well because xgp fits the actual data closely, as evident from the rank-size plot (Fig. 2, first panel) and the KS statistics which are less than or equal to the critical values for all truncation points. The reverse general Pareto does nest reverse Pareto as /3 and a are not statistically different from o and (, respectively, as indicated by the low LR statistics and p-values ranging from 0.204 to 0.587 (last two columns of Table 2).

Table 3 presents estimated parameter values and KS statistics for the census year 2010. For this year, the sample size at each truncation point increases compared to 2000 due to a rise in the number of small cities, which could be attributed

6 Note that there are alternate ways to define cities, such as City Clustering Algorithm and Metropolitan Statistical Area, and studies [18-21] have shown that distribution estimations can be sensitive to the definition.

7 The rank-size plot given in Fig. 2 is for the truncation point log (115) = 4.74 corresponding to the inflection point. The rank-size plots for other sample sizes are very similar to this figure.

012345012345

Log of City Size, 2000 Log of City Size, 2010

Fig. 2. Rank-size p lots, a = 115.

Table 2

Like lihood ratio tests, 2000. Truncation points (log Samp le size Nesting of reverse GP to reverse Pareto

LR p-val

log (85) = 4.44 1026 1.808 0.405

log (95) = 4.55 1229 1.526 0.466

log (105) = 4.65 1444 1.065 0.587

log (115) = 4.74 1613 1.672 0.430

log (125) = 4.83 1791 2.304 0.316

log(135) = 4.91 1980 2.775 0.250

log(145) = 4.98 2189 3.182 0.204

Table 3

Lowertail estimation, 2010.

Truncation points (log Sample size Reverse Pareto Reverse general Pareto 5% Critical values for KS

a KS ß 0 KS

og(85) = 4.44 1689 1.362 0.019 1.324 84.268 0.022 0.033

(0.033) (0.035) (0.215)

og (95) = 4.55 1972 1.373 0.018 1.340 94.300 0.019 0.031

(0.031) (0.033) (0.232)

og (105) = 4.65 2218 1.346 0.020 1.314 104.248 0.017 0.029

(0.029) (0.030) (0.200)

og (115) = 4.74 2502 1.341 0.018 1.312 114.243 0.017 0.027

(0.027) (0.028) (0.196)

og (125) = 4.83 2739 1.318 0.025 1.289 124.204 0.022 0.026

(0.025) (0.026) (0.171)

log(135) = 4.91 3045 1.321 0.021 1.295 134.213 0.017 0.025

(0.024) (0.025) (0.176)

og(145) = 4.98 3298 1.307 0.024 1.282 144.192 0.020 0.024

(0.023) (0.023) (0.162)

Standard errors are in parentheses.

to (a) migration to urban areas that might have pushed some cities to the lower tail and (b) more small cities springing up. The estimated shape parameters for reverse Pareto (a) and reverse general Pareto (//) for 2010 (Table 3) are smaller than those for 2000. The estimated standard errors for a, /3, and o are very small, indicating these parameter estimates are highly significant. Both reverse Pareto and reverse general Pareto replicate the lower tail data well as the KS statistics for both distributions are below the critical value. However, the KS statistics for reverse general Pareto are slightly lower than those of reverse Pareto (except for the truncation points ^ = 85 and 95) because the former better predicts the multiple observations for a given city size observed in the extreme end of the lower tail, as can be seen from the second panel in Fig. 2. Given the larger number of very small cities in 2010 and since reverse Pareto does not predict the extreme small cities as well as reverse general Pareto, the former does not nest with the latter as indicated by the relatively large LR statistics and corresponding significant p-values (see Table 4).

Since a and /3 are greater than 1 for both census years, the city sizes decline at a slower rate than that implied by their rank, i.e., city sizes and ranks are not one-to-one proportional. Our results corroborate Reed [12] who found a strong reverse Pareto for four human settlements: California and West Virginia in the United States and Cantabria and Barcelona in Spain.

Table 4

Likelihood ratio tests, 2010. Truncation points (log () Sample size Nesting of reverse GP to reverse Pareto

LR p-val

log (85) = 4.44 1689 5.865 0.053

log (95) = 4.55 1972 5.068 0.079

log (105) = 4.65 2218 6.510 0.039

log (115) = 4.74 2502 6.672 0.036

log (125) = 4.83 2739 8.162 0.017

log (135) = 4.91 3045 7.799 0.020

log(145) = 4.98 3298 8.729 0.013

References

[1] P. Krugman, The Self-Organizing Economy, Blackwell Publishers Cambridge, Massachusetts, 1996.

[2] X. Gabaix, Zipfs law for cities: An explanation, Quart. J. Econ. 114(3) (1999) 739-767.

[3] Y. loannides, H.G. Overman, Zipfs law for cities: An empirical examination, Reg. Sci. Urban Econ. 33 (2) (2003) 127-137.

[4] K. Gangopadhyay, B. Basu, City size distributions for India and China, Physica A 388 (13) (2009) 2682-2688.

[5] C.M. Urzua, A simple and efficient test for Zipfs law, Econom. Lett. 66 (3) (2000) 257-260.

[6] J. Luckstead, Devadoss S, Do the world's largest cities follow Zipfs and Gibrat's laws? Econom. Lett. 125 (2) (2014) 182-186.

[7] H. Rozenfeld, D. Rybski, X. Gabaix, H. Makse, The area and population of cities: New insights from a different perspective on cities, Amer. Econ. Rev. 101 (5)(2011) 2205-2225.

[8] M.H. Stanley, S.V. Buldyrev, S. Havlin, R.N. Mantegna, M.A. Salinger, H.E. Stanley, Zipf plots and the size distribution of firms, Econom. Lett. 49 (4) (1995)453-457.

[9] J. Eeckhout, Gibrat's law for (All) cities, Amer. Econ. Rev. 94 (5) (2004) 1429-1451.

[10] M. Levy, Gibrat's law for (all) cities: Comment, Amer. Econ. Rev. 99 (4) (2009) 1672-1675.

[11] J. Eeckhout, Gibrat's law for (All) cities: Reply, Amer. Econ. Rev. 99 (4) (2009) 1676-1683.

[12] W.J. Reed, On the rank-size distribution for human settlements, J. Reg. Sci. 42 (1) (2002) 1-17.

[13] S. Devadoss, J. Luckstead, D. Danforth, S. Akhundjanov, The power law distribution for lower-tail cities in India, Physica A 442 (2016) 193-196.

[14] J. Luckstead, Devadoss S, A nonparametric analysis of the growth process of Indian cities, Econom. Lett. 124 (3) (2014) 516-519.

[15] S. Devadoss, J. Luckstead, Size distribution of US lower-tail cities, Econom. Lett. 135 (1) (2015) 12-14.

[16] J. Luckstead, S. Devadoss, A comparison of city size distributions for China and India from 1950 to 2010, Econom. Lett. 124 (2) (2014) 290-295.

[17] US Census Bureau Population estimates: Historical data, 2014. https://www.census.gov/popest/data/historical/index.html.

[18] Makse H A, J.S. Andrade, M. Batty, S. Havlin, H.E. Stanley, et al., Modeling urban growth patterns with correlated percolation, Phys. Rev. E 58 (6) (1998) 7054.

[19] H.D. Rozenfeld, D. Rybski, J.S. Andrade, M. Batty, H.E. Stanley, H.A. Makse, Laws of population growth, Proc. Natl. Acad. Sci. 105 (48) (2008) 18702-18707.

[20] E. Arcaute, E. Hatna, P. Ferguson, H. Youn, A. Johansson, M. Batty, Constructing cities, deconstructing scaling laws, J. R. Soc. lnterface 12 (102) (2014).

[21] E.A. Oliveira, J.S. Andrade Jr., H.A. Makse, Large cities are less green, Sci. Rep. 4 (2014).