This paper first develops a structural micro-founded model of aggregate net migration flow using matching ideas to study how migrants choose between multiple locations using multiple criteria. Migration should reduce inequality in the criteria. Most migration models either do not handle multiple criteria and locations or lack micro foundation. The model predicts that migration flows will be out of all but the top two ranked regions. The empirical work, which uses 1990–1999 Guangdong annual data, confirms this proposition and finds a high degree of common marginal effects of the criteria among 18 locations but also finds increasing regional inequalities.

JEL

F22 J61 O15

1 Introduction

Economic models of migration can be put in a general framework in which each possible location is perceived by an individual to have costs and benefits (Krugman 1992). Migration of an individual occurs when the net gain from changing location exceeds the migration cost. Previous research has identified some key pull factors motivating migration as an expected income gain (Harris and Todaro 1970; Johnson 2003; Todaro 1969), improved employment opportunities (Fan 1996; Greenwood etal. 1986; Zhao 1997), benefits from the infrastructure of an alternative location both economic and cultural infrastructure/amenities (Chen etal. 2008; Docquier and Rapoport 2005), and marriage opportunities (Fan 2002; Fan and Huang 1998; Frutado and Theodoropoulos 2008; Seeborg etal. 2000),. An individual may of course be motivated by more than one of these push-pull factors and different possible locations may provide different mixes of these factors. The economic costs to migration include the immediate transport cost but also other fixed and variable costs in the destination location like education for children (Meyerhoefer and Chen 2011; Plantinga etal. 2012), housing market conditions (Vermeulen-and-Ommeren 2009), regulatory and institutional barriers reflected in national immigration controls and systems of internal administrative control on migration like the hukou in China (Chan and Zhang 1999; Renard et al. 2011; Whalley and Zhang 2007). Typically the costs will vary with destination but, (apart from the immediate transport cost) not with the original location. The combination of all these forces applied to each individual will determine who moves where. In a matching equilibrium, the population distribution between locations and the specific features of different locations are such that no individual has an incentive to move. Matching theory results by Ekeland etal. (2004) and Heckman etal (2002) in particular are relevant to our purpose in which heterogeneous individuals are sorted into different outcomes by a market system which allows for pricing by type of individual. Our approach applies some of the insights of this literature to the migration process, recognising that migration is essentially a disequilibrium event.

The aim of this paper is to develop a theoretical framework^{1} which allows for a menu of alternative migration pull factors in a geographical domain with multiple possible destinations. Each individual currently resides in a particular location and has to weigh up the alternative menu of factors available in each different location, together with the costs of moving to that location. The outcome is a migration decision for each individual and, aggregating these over individuals in each current location, we derive the net migration flows between locations. Apart from the theoretical insights gained into modelling multiple motives for migration and multiple locations, our second main aim is to use this framework to generate an empirically applicable migration equation system. We then apply this to econometric analysis of intra-province migration patterns in Guangdong, one of the provinces of China with the highest such migration rate.

Our main theoretical result is that the interplay of multiple motives and multiple possible destinations for migration will lead to agglomeration. Individuals will typically want to move to a location seen as best for the combinations of its advantages. However individuals already living in such a favoured location, but whose individual circumstances have turned out poorly there, may wish to leave and move to the location offering the second best advantages. Our approach establishes these tendencies in quite a general framework. We add some assumptions to the framework which results in a net migration equation which can be empirically applied in a setting with multiple migration motives and multiple locations. We apply this to intra-province migration in Guangdong for the period 1990–1999 with 18 locations and 4 migration motives. We find that the approach represents the data quite closely and that there is a high degree of preference homogeneity between individuals in different locations. This implies that if moving costs are low, the equilibrium spatial population distribution should yield equality between locations in the factors which cause migration. However this is not the case for all factors, in fact regional inequality has been growing for these despite the high net migration flows. We conclude that spatial equilibrium in Guangdong has not yet been reached.

Section 1 briefly reviews existing knowledge of factors causing migration and empirical work on Guangdong, Section 1 develops the theoretical framework, Section 1 describes the data, outlines the econometric strategy and presents the empirical results. Second 5 contains a conclusion.

2 Literature review

Some of the comprehensive surveys of migration research recognize the multiple push-pull factors which drive it and that migrants face a choice between alternative destinations. Greenwood (1997) studies the determinants of migration in developed countries, including characteristics both of places and of individuals and households. Taylor and Martin (2001) study the complexity of migration determinants and its impact in rural economies. In the context of climate driven migration, Lilleor and Broeck (2011) also recognize the variety of migration causes.

Widely noted pull factors that we subsequently use are:

(i)

Expected Wage Income.

In the classic Harris-Todaro model approach, migrants are motivated by the high wage and the high employment probabilities in the chosen destination. There are many subsequent applications of these two ideas, for example Johnson (2003) finds that rural-urban migrants in China move with an urban-rural wage gap of 50−−70%. For employment probability, Greenwood etal. (1986) find an elasticity of about 0.5 of migrants to job vacancies, and also unemployment shocks often hit immigrant workers most heavily Brucker and Jahn (2008).

(ii)

Expected Self-Employment Profits.

In European countries, 10–25% of migrants establish themselves as self-employed in the destination (OECD 2010) and in China it rises to 40% or more (RUMICI 2007) or even 60% in Guangdong (Fan 1996, 1999, 2003). There is evidence that self-employed migrants are most influenced by the size of market and population of the destination since diverse social networks are important (Federician and Giannetti 2010; Kugler and Rapoport 2005). In the UK there is a high concentration of self-employed immigrants in London (Dustmann etal. 2007) and 47% of the national self-employed migrants live there (The Migration Observatory 2012).

(iii)

Location Infrastructure.

Available local public goods differ by location eg transport and communications, public health or education services, cultural aspects and these affect the perceived quality of life in locations (Rappport 2008). Synergies and externalities have a similar effect (Chen etal 2008). Often these come just from the size of the population in the destination.

(iv)

Female Migration & Marriage Motives.

Globally around 50% of migrants are female (Piper 2005). Marriage is one important pull factor in the UK (accounting for 40% of migrant settlements during 2008–2010 (Charsley and Liversage 2011) and Asian (Fan 2002; Fan and Huang 1998; Zhu 2002. Another is employment opportunities which may be gender specific eg in textile industries (Seeborg etal. 2000). Marriage can also interact with employment prospects (Frutado and Theodoropoulos 2008).

An important issue is identifying and modelling the costs associated with migration between particular locations. Transport costs depend on the physical distance between locations (Poncet 2006). In several NELM models (Docquier and Rapoport 2005; Mesnard 2000), fixed costs of migration are used. Institutional and regulatory barriers such as visas or the hukou system in China Chan and Zhang (1999) create costs. Whalley and Zhang(2007) conclude that inter-province wage differentials caused by the hukou system impose a significant welfare loss. In Guangdong (our focus for empirical work) the majority of migrants initially hold an agricultural hukou (Fan 1996; 1999; 2003; Zhu etal. 2009) which means permanent migration will require a change of hukou.

3 Theoretical background

3.1 General framework

There are n individuals h = 1..n who are each initially located in one of m different places i = 1…m. Each individual derives utility from K(continuously divisible) location specific factors^{2}f =(f_{1},f_{2}..f_{
K
}). Individuals have utility u(f) and by definition utility is increasing in each factor. Within each location there is a multivariate distribution of the factors across the individuals who reside there, so for example individuals h and l who both reside in the same location may respectively enjoy values of the factors f =(f_{1h
},f_{2h
}..f_{
K
h
}) and f =(f_{1l
},f_{2l
}..f_{
K
l
}). This generates intra-location differences. In addition locations differ in the distribution of factors so there are also inter-location differences, stressing the role of individual heterogeneity in line with Borjas (1999) and Ekeland et al ( 2004). For example the mean or the variance of factors realised may differ between locations (one location may be relatively rural with a low mean and variance of wages, another have a more mixed industrial structure with both a higher mean and a higher variance in wages). In terms of matching theory there is bunching in locations because two individuals who are both ex-ante best suited to a particular location experience different “luck” in accessing the attributes of the location eg by chance one individual may get a better job offer than the other even though they are of identical productivity. But on average one location may attract mainly low skilled workers whilst another attracts high skilled workers. Thus within location utility differences largely result from luck but between location differences from more deterministic heterogeneity in location and individual characteristics.

In a matching model, Ekeland-etal (2004) have derived a closed form solution for the equilibrium matching allocation when individual utility is linear in unobserved deterministic heterogeneity but quadratic in the location factors. Suppose that average utility derived from factors z is quadratic in z,a^{′}z+1/2z^{′}Bz, and immigrants into the overall area have to choose their location. Individual deterministic heterogeneity, ε, causes differences in realised utility between individuals who choose the same location according to εa^{′}z+1/2z^{′}Bz^{3}. So the N locations can be ranked by each individual with the best location for ε being \underset{i}{max}\in N\epsilon {a}^{\prime}{z}^{i}+1/2{z}^{{i}^{\prime}}B{z}^{i} where z^{i}is the vector of factors in location i.

Figure 1 (Figure 1 location rank) shows the utility available to different individuals from locating in each of four different locations i = 1,..4. Locations 3 and 4 are dominated and optimal for no individual. Any individual with ε > ε^{∗} has location 1 as their top ranked destination with second best location 2. Any individual with ε < ε^{∗} has top ranked location 2 with second best choice either location 3 or location 1. Hence the population is sorted into one group who will migrate into location 1 and another who will migrate into location 2. If that is the whole story then each individual will locate in his best destination.

We have to adapt this to analyse movement between locations eg from one location to another within Guangdong. Each person within any given location compares their current circumstances with what they can expect to attain by a move to an alternative location, this interaction determines who moves where. An individual h currently in i could move to location j. If he moves, he does not know exactly what factor combination he will get in j because there is heterogeneity within a location in individual experience there. He has to assess the range of payoffs he could get from different possible combinations x of the factors {f}_{1x}^{j},{f}_{2x}^{j},\mathrm{..}{f}_{Kx}^{j} which might occur in j. We assume each individual assesses his gross benefit from moving to j as \mathit{\text{Eu}}\left(f|j\right)={\mathrm{\Sigma}}_{x}{\pi}^{j}({f}_{1x}^{j},{f}_{2x}^{j},\mathrm{..}{f}_{\mathit{\text{Kx}}}^{j})u({f}_{1x}^{j},{f}_{2x}^{j},\mathrm{..}{f}_{\mathit{\text{Kx}}}^{j}). Here π (.) has the interpretation that in j, h has the chance π of getting the factor combination {f}_{1x}^{j},{f}_{2x}^{j},\mathrm{..}{f}_{\mathit{\text{Kx}}}^{j} in j. It is then natural for any individual to condition his probability distribution over factors in j on the mean level of the factors there which are observable to the individual, {\mu}_{1}^{j},\mathrm{..}{\mu}_{K}^{j}, so that {\pi}^{j}=\pi ({f}_{1x}^{j},{f}_{2x}^{j},\mathrm{..}{f}_{\mathit{\text{Kx}}}^{j},{\mu}_{1}^{j},\mathrm{..}{\mu}_{K}^{j}).

Again locations are ranked by each individual, but now the best off individuals in any location may prefer to stay where they are rather than to move, whilst those with the lowest standard of living in a location are the most likely to move. The heterogeneity within a location may partly depend on deterministic individual characteristics but also in large measure on differences in luck that individuals have experienced in the current location. We can model this by assuming that any individual who enters a new location has an equal chance of enjoying the mean standard of living there. They can also stay in their present location and enjoy their current standard. The first diagram of Figure 2 (Figure 2 location rank with & without heterogeneity) shows the utility distribution of different individuals in a given location as a function of their past experience (luck) and type on the horizontal axis, for example ranging from very unlucky to very lucky or from unskilled to skilled. The positively sloped lines show the current utility enjoyed from different types at present in that location. The horizontals E_{1},E_{2} show the mean utility level that any type of one location can expect from a move to the other location. Thus in location 1 all types with utility below currently below ε_{1} will move to location 2 but all others will remain in location 1. Similarly all types in locations 2 with ε < ε_{2} will move to location 1 but all others will remain. In fact in our data observed migrants actually have very similar individual characteristics in terms of age, education, marital status (RUMICI 2007) so presumably the within location heterogeneity is largely caused by past experience. Also in the data the observed city factors z in the data actually have strong dominance relations (the same city empirically tends to come top on each factor in our sample period). This gives us the second diagram of Figure 2 (Figure 2 location rank with & without heterogeneity) in which the utility distributions of different locations never intersect and the mean utility of each location is unambiguously ranked. In this case below the best location, the worse off in each location all move into location 1, but the worst off in location 1 move to the second best location 2.

The implication is that with homogenous expected utility (preferences u() and probabilities {\pi}_{x}^{j}) between individuals, all individuals in all locations will agree on j^{∗} as being the location offering the highest Eu(f|j). Any two individuals who presently have the same factors will agree on the new particular j^{∗} location that offers the best standard of living Eu(f|j^{∗}) irrespective of where they are currently located (so long as they are not located in j^{∗}).

However measured in commensurate utility terms, if the individual does move from i to j he also bears costs v_{
i
j
}so the net expected benefit he could secure from a move to j is

The best alternative location for h is then N{B}_{{j}^{\ast}h|i}=\underset{j}{max}\left\{N{B}_{\mathit{\text{jh}}}\right|j=1\dots n\left|i\right\} and h will move to j^{∗} if N{B}_{{j}^{\ast}h|i}>0.

It is also natural to assume that the moving cost v_{
i
j
}is additive in ij, that is v_{
i
j
}= c_{
j
}+ c_{
i
}. Costs based on distance between locations will satisfy this and so will any other fixed moving costs. There is little reason to suppose that leaving costs from exiting i depend on the destination so v_{
i
j
}− v_{
i
k
}= v_{
l
j
}− v_{
l
k
}. With an appropriate choice of units, this implies that v_{
i
j
} is additive. Indeed the exit costs from a location may be very close to zero. With additivity of costs, the best destination {j}^{\ast}=\underset{j}{max}\left\{{\mathrm{\Sigma}}_{x}{\pi}_{x}^{j}u\right({f}_{1x}^{j},{f}_{2x}^{j},\mathrm{..}{f}_{\mathit{\text{Kx}}}^{j})-{c}_{j}\}. If v_{
i
j
}is small too, then relative to the utility differential Eu(f|j^{∗})−u(f_{1h
},f_{2h
}..f_{
K
h
}) will determine the best destination for individual h.

If an individual is currently in j^{∗} any uncertainty about the combination of factors he faces there has already been resolved. However such an individual can still think about moving elsewhere and if he is currently low down in the distribution of attainable utilities in j^{∗}, then he may have the prospect of a higher expected utility by moving to an alternative location. This location j^{∗∗} will be the one which is judged second best by all individuals. That is j^{∗∗} maximizes {\mathrm{\Sigma}}_{x}{\pi}_{x}^{j}u({f}_{1x}^{j},{f}_{2x}^{j},\mathrm{..}{f}_{\mathit{\text{Kx}}}^{j})-{c}_{j} over all j other than j^{∗}.

How does this general framework lead to migration flows between locations? In any location i there is a distribution of the factors f defined by the cdf G_{
i
}(f) with associated density g_{
i
}(f). Define the lower sets of u(.) by L(u^{∗}) = {f|u(f)≤u^{∗}}. Then there is a corresponding distribution of utilities in the location of {H}_{i}\left({u}^{\ast}\right)=Pr\left(u\right(f)\le {u}^{\ast}\}=\underset{f\in L\left({u}^{\ast}\right)}{\int}{g}_{i}\left(f\right)\mathrm{df.} All individuals in location i with {\mathrm{\Sigma}}_{x}{\pi}_{x}({\mu}_{1}^{j},\mathrm{..}{\mu}_{K}^{j}){u}_{x}-{c}_{j}>u\left({f}_{1}^{i}\mathrm{..}{f}_{K}^{i}\right)+{c}_{i} will desire to move. If the exit cost c_{
i
} is zero, a proportion {H}_{i}\left({\mathrm{\Sigma}}_{x}{\pi}_{x}\right({\mu}_{1}^{j},\mathrm{..}{\mu}_{K}^{j}){u}_{x}-{c}_{j}) could increase their utility by a move to location j. With N_{
i
} individuals currently in location i, the number who could benefit from migrating from i to j will be {N}_{i}{H}_{i}\left({\mathrm{\Sigma}}_{x}{\pi}_{x}\right({\mu}_{1}^{j},\mathrm{..}{\mu}_{K}^{j}){u}_{x}-{c}_{j}). However since all individuals not presently in j^{∗} agree that j^{∗} offers the highest gain from all possible moves, the migration flow into locations other than j^{∗} will be zero from origins other than j^{∗}. So we derive aggregate immigration into j^{∗} from all other locations of {I}_{{j}^{\ast}}={\mathrm{\Sigma}}_{i\ne {j}^{\ast}}{N}_{i}{H}_{i}\left({\mathrm{\Sigma}}_{x}{\pi}_{x}\right({\mu}_{1}^{{j}^{\ast}},\mathrm{..}{\mu}_{K}^{{j}^{\ast}}){u}_{x}-{c}_{{j}^{\ast}}) and zero immigration into any other location.

However some individuals at present in j^{∗} will have had unfortunate experiences there and will be low down in the utility distribution in j^{∗}. They could gain from a move to the second best location. The number of individuals in j^{∗} who see they can secure the highest improvement in their standard of living by a move to j^{∗∗} is {N}_{{j}^{\ast}}{H}_{{j}^{\ast}}\left({\mathrm{\Sigma}}_{x}{\pi}_{x}\right({\mu}_{1}^{{j}^{\ast \ast}},\mathrm{..}{\mu}_{K}^{{j}^{\ast \ast}}){u}_{x}-{c}_{{j}^{\ast \ast}}).

Proposition 1.If individual preferences and chances of success in any destination location are common, and if the migration cost is additive with exit costs being very low relative to entry costs, migration flows will be out of all but two locations. The two locations with immigration will be ranked as the top two in terms of the overall standard of living.

This is a general abstract formulation and specializations of it are of interest, either to derive clear cut theoretical properties of migration flows like comparative statics with respect to characteristics of locations, including inequality in the distribution of factors within locations, or for empirical application. One natural specialization that we will use for empirical work is to assume that H_{
i
}() is linear \left({H}_{i}\right({\mathrm{\Sigma}}_{x}{\pi}_{x}({\mu}_{1}^{j},\mathrm{..}{\mu}_{K}^{j}){u}_{x}-{c}_{j})={\mathrm{\Sigma}}_{k}({a}_{k}+{b}_{k}{\mu}_{k}^{j})-{c}_{j}). It is obviously very convenient and can arise in several scenarios:

(i) Specialized preferences

Suppose that in any location the population partitions into groups who are each affected by a single different factor. For example one group is employees who are motivated solely by labour earnings, another group are disabled individuals who are heavily dependent on local public goods like health care. Each individual within the group for which the kth factor is the sole determinant of utility judges alternative locations solely in terms of that factor. All individuals in this group in any location judge the particular location {j}^{\ast}\left(k\right)=arg\underset{j}{max}{\mathrm{\Sigma}}_{x}{\pi}_{x}\left(\underset{k}{\overset{j}{\mu}}\right)u\left({f}_{\mathit{\text{kx}}}\right) as the most desirable in which to live. There is a distribution of the kth factor within location i which generates a utility distribution for this group of H_{
i
k
}(u(f_{
k
})). From each location i a proportion of individuals {H}_{\mathit{\text{ik}}}\left({\mathrm{\Sigma}}_{x}{\pi}_{x}\right({\mu}_{k}^{j}\left)u\right({f}_{\mathit{\text{kx}}}\left)\right) in the k-group will want to move to j^{∗}(k).

Taking each group in turn, for each factor there may be a different top ranked location j^{∗}(k) and a particular location may be top ranked on more than one factor. The immigration into any location j will be the sum of those individuals k in other locations i who judge j as the best destination on any one of the K factors. So we get immigration into j of {\mathrm{\Sigma}}_{i\ne j}{\mathrm{\Sigma}}_{k}{N}_{\mathit{\text{ik}}}{H}_{\mathit{\text{ik}}}\left({\mathrm{\Sigma}}_{x}{\pi}_{x}\right({\mu}_{k}^{j}\left)u\right({f}_{\mathit{\text{kx}}})-{c}_{j}) where N_{
i
k
} is the number of individuals living in i who are motivated only by the kth factor. If the distribution of the factor within groups is uniform and so the cdf is linear, this reduces to {\mathrm{\Sigma}}_{i\ne j}{\mathrm{\Sigma}}_{k}{N}_{\mathit{\text{ik}}}({a}_{k}+{b}_{k}{\mu}_{k}^{j})-{c}_{j}. In addition there will be the second choice individuals who are presently in the top location for a particular factor that determines their standard of living, but whose present state on that factor is so unfavorable that they would be better off moving to the second best location on that factor.

(ii) Complementary preferences:

If u(f_{1h
},f_{2h
}..f_{
K
h
}) = min(f_{1h
},f_{2h
}..f_{
K
h
}) then effectively each individual is constrained by a single factor (ignoring ties) in their current position. The utility they anticipate from any alternative location is {\mathrm{\Sigma}}_{x}{\pi}_{x}^{j}min(\underset{1x}{\overset{j}{f}},{f}_{2x}^{j},\mathrm{..}{f}_{\mathit{\text{Kx}}}^{j}). We can partition the support of the distribution of f into regions R_{
k
}in which the kth factor is critical (for simplicity in the argument ignoring ties, which have minimal probability) so that eg for any ({f}_{1x}^{j},{f}_{2x}^{j},\mathrm{..}{f}_{\mathit{\text{Kx}}}^{j})\in {R}_{k},{f}_{\mathit{\text{kx}}}^{j}=min(\underset{1x}{\overset{j}{f}},{f}_{2x}^{j},\mathrm{..}{f}_{\mathit{\text{Kx}}}^{j}). If {\pi}_{k}^{j} is the probability of the region of the support in which the kth factor is the constraining factor on utility, we can write {\mathrm{\Sigma}}_{x}{\pi}_{x}^{j}min(\underset{1x}{\overset{j}{f}},{f}_{2x}^{j},\mathrm{..}{f}_{\mathit{\text{Kx}}}^{j})={\mathrm{\Sigma}}_{\kappa}{\mathrm{\pi}}_{\kappa}^{j}{\mathrm{\Sigma}}_{k\epsilon \kappa}{\pi}_{\mathit{k}|\kappa}{f}_{\mathit{k}}^{j} where π_{
k|κ
} is the probability of particular values of the kth factor given that this factor is the minimal constraining one. For example an individual is affected by all factors but needs both a high wage and good infrastructure in fixed ratios. If the ratio is not met then his standard of living is set by the lower of the two. Hence his expected utility is determined by the chance that each of the two factors is constraining, and then within the region where one factor is critical, the expected value of the utility of that critical factor.

Under the assumption that {H}_{i}\left({\mathrm{\Sigma}}_{x}{\pi}_{x}\right({\mu}_{1}^{j},\mathrm{..}{\mu}_{K}^{j}){u}_{x}-{c}_{j})={\mathrm{\Sigma}}_{k}({a}_{k}+{b}_{k}{\mu}_{k}^{j})-{c}_{j}, aggregate immigration into j^{∗} from all other locations is

However if j^{∗∗} is the second best location ({j}^{\ast \ast}=argmax\left({\mathrm{\Sigma}}_{x}{\pi}_{x}\right({\mu}_{1}^{{j}^{\ast}},\mathrm{..}{\mu}_{K}^{{j}^{\ast}}){u}_{x}-{c}_{{j}^{\ast}}|j\ne {j}^{\ast}\left)\right) then

China’s rapid development has been largely regional with construction of infrastructure and establishment and growth of an industrial base concentrated in particular areas. Resulting regional inequalities have stimulated migration, although the hukou system has acted as a migration barrier of variable force. Guangdong is a Chinese province close to Macao and Hong Kong, which has attracted government financial incentives for development and high FDI from Hong Kong. Growth has concentrated around the Pearl River, triggering high levels of intra-province migration (2.7 times higher than its already high inter-province immigration). We use 18 Guangdong city areas as our location units^{4} for the period 1990–1999. The cities are heterogeneous: Guangzhou (1), Shenzhen (2), Zuhai (3), Dongguan (10), Zhongshan (11) and Foshan (13) are industrialised Pearl River cities. Dongguan (10), Zhongshan (11) and Foshan (13) are traditional industrial centres and Shenzhen (2) and Foshan (13) are also Special Economic Zones (Nishitateno 1983) with government incentives. The cities further from the Pearl River nexus, Shaoguang (5), Heyuan(6), Meizhou (7), Huizhou(8) and Shanwei (9) in the north, Zhanjian (15), MaoMing (16), Yangjiang (14) and Qiangyuan (18) in the south, show lower net migration and some have net emigration. Finally there are cities (22) and (23) formed by merging the administrative units of (4), (19), (20) and (17), (21) (Figure 3 Map of Guangdong). The merged city has very low net migration.

For each city and year the variables we measure are

(i)

permanent net migration generally involving a change in hukou (NM),

(ii)

the population (both genders and all hukou types) (P)

(iii)

urban employment (E) and urban wage (W_{
u
}) in the top three permanent sectors (state owned, urban collective owned and other units)

(iv)

rural income per capita (W_{
r
})(defined as the ratio of gross agricultural output in rural primary industry to rural primary industry employment)

(v)

a city specific consumer price index Π used to derive the real urban and rural wages w_{
u
}= W_{
u
}/Π,w_{
r
}= W_{
r
}/Π

(vi)

capital stock K derived from an initial stock, foreign direct investment flows, and a city specific depreciation rate^{5}.

Table 1 (Table 1 Average value of key variables) describes the different city characteristics (the details of thoes key variables are defined in Table 2 Description of key variables): the net migration population ratio (NM/P), w_{
u
},w_{
r
}, urban hukou holders as a % of the city population (Urbanhukou/P), the late marriage rate^{6}, capital stock per city inhabitant (K/P), the number of single females, the population (both in millions of people) and the city size in million square metres. The first three cities are the key Pearl delta cities and are the most urbanized, physically small, highly industrialized and with a high population density, the highest urban wage but quite high urban-rural real wage inequality. Together with Dongguan they share the highest capital/population ratio. Proportionally to the population, the capital Guangzhou, Zhuhai and especially Shenzhen have a low prevalence of single females and they also have a high late marriage rate, indicating both a better educated and slightly older population. At the other extreme, cities 14–16 have low degrees of urbanization and relatively low real urban wages and real city income although the rural wage is not very low. The highest inward net migration is into the Pearl River cities 1–3 and Huizhou and Foshan. The city with the highest outward net migration is Shaoguan.

Table 3 (Table 3 Inequality between cities over time (coefficient of variation)) shows the coefficient of variation across cities of the variable in question through time. It indicates growing inequality between the cities over time in both rural and urban wages and in capital stock per inhabitant. Interestingly variations in the late marriage rate between cities is falling but variations in the number of single females between cities and size differences between cities are roughly constant. Urbanization seems to be spreading slowly across cities so that variations between cities are gently falling. In a word overall, the rapid development since 1995 has generally been accompanied by an increase in inequality between city areas.

Applying the Theory to Guangdong

Guided by earlier studies and the descriptive statistics above, we select four factors as motives for migration:

(a) to work as an employee in the three chief employing organizations^{7} in j, in which case the primary motivation is an expected real wage difference between cities.

(b) to be self-employed in j, where the difference in profit opportunities between i and j accounts for the move. The size of the market P is one proxy measure, capital intensity K/P is another. These variables should reflect the demand for services of the self-employed and their cost determining variables.

(c) to get married/join friends etc who are in j. One measure of the relative desirability of different cities in their marriage opportunities is given by the gender structure of the population of single individuals in different cities. We measure this by the number of single females.

(d) to leave a mainly agricultural city area to move to a more urbanized city area where infrastructure is better developed, measured by the % of the urban population holding an urban hukou.

We model the city specific migration costs by a mix of a distance measure (a city specific constant A_{
j
}) and the % of the urban population holding an urban hukou. The latter reflects the severity with which the hukou policy is applied in different areas, a high ratio indicates a greater barrier. Note we are using the urban hukou ratio to reflect two different and opposite signed effects on net migration.

Our theory as exposited in (1) determines net migration flows from the wage, self-employment profit, marriage chance and urban infrastructure gaps between cities together with the migration costs

This has the interpretation of a gaps model (Zhu 2002) in a multi area and multivariate context^{8}. There are 6 coefficients to estimate for each city giving a total of 108 regression coefficients.

Having defined the rankings of cities^{9} such that the top ranked is the most desirable, and individuals move when possible to the top two ranking cities, the coefficients b_{k} should generally be positive. There are some possible ambiguities: differences in the number of single females may proxy the availability of female worker jobs especially in the textile industries (Fan 2003; Huang 2001) or may proxy marriage chances. Similarly, depending on whether capital and labour are substitutes or complements, capital stock can have an ambiguous effect on employment prospects via the demand for labour. Finally, population can have an ambiguous effect, it could reflect disadvantages due to congestion in an area or the level of demand for the services and output of the self-employed.

We add a disturbance ε_{
i
t
}which is assumed to have a zero mean at each i,t and initially for given i, to be independent over time t with a constant covariance matrix across cities. We test the lack of autocorrelation of the residuals following estimation using Wooldridge’s (2002) panel serial correlation test.

Adding the disturbance and using more succinct notation, (2) becomes

Secondly we have to specify the covariance structure between cities. For each city the variance is constant over time since it is iid. But the variances could differ between cities, effectively giving the disturbances a panel structure. Similarly the shocks of any two cities may be positively correlated (like a common global shock to all cities) or negatively correlated (eg if there is some uncertainty over the best destination for a migrant who has decided to move within Guangdong). We use the Pesaran (2006) and Hoyos and Sarafidis (2006) test for cross section dependence. The test statistic has an approximate normal distribution which should be valid even in small samples.

We estimate the parameters by GLS allowing for the variances of disturbances to differ by city. In order to check the robustness of GLS, we also estimate by weighted OLS and find equivalent results. We allow the constant terms (A_{
i
}) and all the slope coefficients {B}_{i}^{x} to vary by city through the use of dummy variables for each city. The most general model has 108 regression parameters^{10}, which shows no evidence of serial correlation or cross section dependence and also no evidence of panel effects or heteroscedasticity in the disturbances. So subsequently we use a diagonal covariance matrix of the form Eε_{
i
t
}ε_{
j
s
}=0 for all t,s and i≠j but E{u}_{\mathit{\text{it}}}^{2}={\sigma}_{i}^{2} for all t.

We impose zero and equality restrictions on coefficients to reduce the system to 67 coefficients (Table 5 GLS_67 coefficients). The restrictions are accepted on a likelihood ratio test, applied sequentially and still has no autocorrelation or cross city correlation (Table 6 Diagnostic test statistics). Estimating the same model by weighted OLS (allowing for disturbance variances to vary by city) gives very similar coefficients, an R^{2}=.959 and easily passes a Ramsey Reset test. The weighted OLS residuals also show no sign of autocorrelation or cross section dependence. The evidence is that this base model with 67 coefficients is an adequate specification of the migration process. The accompanying plots show the relation between the actual and predicted net migration by city (Figure 4 GLS_67 coefficients Y vs predicted Y). Generally the model is replicating the data as one would expect, smoothing some of the sharper fluctuations especially in cities 1, 8, 22.

However many of the coefficients are very similar across cities, although there are some outlying gap-city combinations. Testing for equality of coefficients to group city and factor effects leads to a final model with just 27 coefficients (Table 7 GLS_27 coefficients) with a loglikelihood of −29.60 and the model also passes all the diagnostic tests. Comparing the plots (Figure 4 GLS_67 coefficients Y vs predicted Y and Figure 5 GLS_27 coefficients Y vs predicted Y) of the actual and predicted values by city for the 27 and 67 coefficient models reveals that we lose relatively little in terms of goodness of fit from imposing these restrictions on the 67 coefficient base model. We still track the data quite well and pick up most turning points in the net migration data. The 27 coefficient model represents our final model. For the sake of robustness we also estimate the final model by weighted least squares with weights being the estimated standard deviations of residuals for each city (so there is cross section heteroscedasticity but no cross section dependence). We include a constant term to allow the conventional calculation of R^{2} (which is 841). The coefficient estimates and standard errors are very similar for weighted least squares and panel based GLS, and the weighted least squares results also satisfy diagnostic tests. With Anderson’s small sample log likelihood correction, the restrictions in the final model are also accepted against both the initial 108 and the base 67 coefficient model.

There is an argument that there may be some endogeneity in the regressors especially in the wage gap variable. Shocks in net migration may feed back through the city labour market into shocks in the real wage. Thus the wage gap variables may be correlated with the net migration disturbances. We instrument the three wage gap variables by FDI and employment for the common group of cities and for cities 9 and 22, giving 6 instruments in all (the Sargan test for overidentification has a p value of 0.265) and perform a Hausman-Wu test of the difference between the IV and the OLS estimates. It is not significant (the p value is 0.42), and so we conclude that there are no significant feedback effects between net city migration and the city wage gap variable.

In this reduced model all cities except for 9 and 22 have a common positive effect of expected wage differences. The expected wage gap does affect net migration into cities 9, 22 but to a smaller extent than in the other cities. Only a few cities have a responsiveness of net migration to the level of capital stock (cities 8, 9, 13, 22 and 23) and in these cities there are heterogeneous reactions to capital stock. The population gap is important in affecting net migration in all cities but only the effects in cities 7 and 22 are positive and heterogeneous whereas in the other 16 cities the response to the gap is negative although quantitatively small. All cities have net migration effects of their degree of urbanization, this is an equal effect in 12 cities but there are heterogeneous effects in 6 cities (cities 2, 7,13, 14, 16 and 22). The effect of the gaps is thus common for most cities and most gaps.

The city specific constant terms in Table 4 reflect a relatively constant stream of net migration which is due to unobservable or non-measured city amenities (Davies etal. 2001). These factors are not determined by the operation of the gaps. These effects are important in half of the cities and in the majority of these cities there is inward migration which is not related to the gaps that we have identified.

Generally the gaps work in a way that is consistent with the theory^{11}: the population gap is a broad exception but it’s role generally is dominated by the merged city 22 which is very much larger than the other cities in terms of population. There are some other specific exceptions like the negative impact of the urbanization gap on net migration into city 7, indicating that, for that city, the hukou migration cost element outweighs the benefits of urbanization. Most of the heterogeneous gap effects can be explained in terms of special city characteristics. City 2 has been one the fastest growing cities in terms of capital stock and net migration. It is a high urban wage, densely populated and highly urbanized location. City 7 is a northern mountainous city with low urban wage and urbanization, high population but low population density and low capital stock. It shows high emigration. City 8 has a relatively high capital stock and low population and its textile industry base does not yield very high expected urban income, nevertheless it attracts immigration. City 9 is a coastal city and is the main Guangdong seafood producer with other industry concentrated on shipping construction. It is a low population and low population density city, with a low expected urban income and low capital stock, but despite this it has mean positive net migration. Foshan (13) is one of the industrial tigers with high expected urban income, capital stock and population and a relatively high degree of urbanization. It attracts positive net migration but is neither the leading nor second city in terms of the gap rankings. Cities 14 and 16 are low expected urban wage, relatively rural cities with low capital stock and average or low population density. Their mean net migration is close to zero. The merged city 22 stands out as having the greatest amount of specific heterogeneity in the migration response to gaps. As stated above it dominates the other cities in population size but is relatively non-urbanised although it has a high population density. It also has low capital stock and at best average expected urban income. It’s mean net migration is close to zero. Finally city 23 is a similar administratively merged city, sharing many of the characteristics of city 22.

5 Conclusions

Individuals are in heterogeneous circumstances and any one individual is affected by many different utility relevant variables. If individuals can locate in different possible places which also have heterogeneous characteristics, then we would expect movement of individuals between locations. We develop such a multi-motive and multi-location theory to determine the aggregate net migration flows between locations. We add the assumption that all individuals have identical preferences defined over multiple location specific characteristics and, at a general level, derive the result that there will be a tendency to agglomeration. All individuals will agree on a ranking of locations from best to worst. Those individuals with bad experiences in their current location will gain the most by moving to the location which is universally judged the best. Individuals who have current utility above the average for their present location may prefer to remain in situ especially if the migration cost is substantial. So less attractive locations will have emigration especially of the lowest utility inhabitants while the best location will have inward movement. Individuals who start off in the best location, but whose individual experience in that location is much below the location average may find it advantageous to move into the second most attractive location. So on balance the second best location may have net immigration or net emigration. We would expect agglomeration into the top two cities to occur. This matches up with some of the settlement patterns predicted in economic geography type models. There are some theoretical innovations in migration modelling. Our approach allows for multivariate determinants and multi-location choices of net migration flows. People move to places where the chance of an improvement of their current circumstances in some dimension is highest. We confirm the basic Harris-Todaro insight that expected labour income differences are important but also confirm Krugman’s view that each location has a variety of push and pull factors determining migration.

The framework we use is static and one period. We abstract from temporary migration, planned reverse migration and commuting/guest working. We also work with an individual as the decisionmaker which allows us to avoid specifying family decision processes and to derive aggregate net migration equations in a multi-motivation, multi-destination setting. A partial justification is that much aggregate migration data is at the household not individual level, and abstracting from intra-family decision rules yields empirically testable equations. However clearly a next step would be a multi-period and family based model.

We then use this framework to study the net migration flows between 18 different regions of the Guangdong province in China. Guangdong is particularly suitable for this purpose since it has experienced very rapid growth and industrialization in conjunction with high levels of inward immigration from the other Chinese provinces, and even higher levels of intra-province migration. We divide Guangdong into 18 city areas which have varying degrees of urbanization and use panel data on these 18 cities for 1990–1999 to econometrically investigate net migration flows between the cities allowing for cross section heteroscedasticity. We find that net migration into the majority of cities can be well explained by a common set of parameters. There is some limited heterogeneity between cities in how net migration responds to the differentials, out of a potential total of 90 city-differential heterogeneities we find that we need just 15 specific coefficients. The remaining heterogeneities are in the impacts of capital stock and the degree of urbanization. Nearly half of the cities share a common mean amount of net migration which is unrelated to the four differentials we identified. No cross section dependence and serial correlation are detected in the final model. In terms of goodness of fit and tracking the data city by city, our model performs well and there is no evidence of model misspecification.

In a locational equilibrium, the net benefits of moving between cities should be equalized. In fact inequality between cities in some of the relevant factors has increased not fallen over our sample. The coefficient of variation of urban/rural income and capital stock per capita suggest that inequalities between cities are increasing over time. Taken together, the rising inequality in some migration inducing factors may imply that a full locational equilibrium has not yet been achieved.

It is well known that Chinese labour migration is substantial and exhibits different types of flows. It is widely argued to be a very important component in rapid Chinese growth and development, thus its policy importance is clear. Although the data sources are much more abundant than 20 years ago, there is still a paucity of degrees of freedom and coverage of some of the relevant factors. This forces some imperfection in our modelling strategy, but, given this, the results here are robust to a range of specification tests.

Endnotes

^{1}To our knowledge, our work is the first to aggregate migration flows from individual decisions allowing for multiple pull factors and multiple locations. Bazzi (2012) develops a microfounded model of aggregate migration flows to study to what extent financial barriers limit international migration. But he did not consider multiple pull factors and multiple location choices.

^{2} The idea is that the factors measure variables like the wage, employment opportunities, infrastructiure and local public goods, profit prospects for self-employed individuals, marriage prospects, etc.

^{3} In fact in this special case where the matching is one-sided, if the realised utility has the form εf(z)+g(z), where f(·)>0 the same argument follows.

^{4} These are an aggregation of the 21 adminstrative city areas to ensure unique boundaries over 1990–1999. Each city area has both an urban and rural part. The data is taken from Guangdong Statistical Yearbooks Guangdong Statistical Yearbook (1990–1999).

^{5}K_{
t
}= (1−δ)K_{
t−1}+FDI_{
t
}, where δ is the depreciation rate. The base value of capital stock is given by the 1992 historic cost value of assets. The depreciation rate is computed as the % difference between the net value of fixed assets and the historic value of fixed assets in 1992, the mean of this is about 25%.

^{6} Defined as the number of females who were at least 23 years old at marriage as a proportion of the total number of first marriages.

^{7} These are either state or urban collective units, or private sector units with joint ownership, shareholding or foreign ownership (ie excluding self-employment).

^{8}Instead of thinking of the distribution of the factor within a city, this can be interpreted as saying for example that net (and gross) migration from a city ranked three or lower into the top city is the total factor gain from the lower rank city achieving the mean factor of the top city. We can also identify the constant term with a combination of a constant net migration flow unrelated to the gaps (giving a positive A_{
i
}) and a moving cost effect which deters some of the net migration driven by the gaps(giving a negative A_{
i
}).

^{9} The ranking of cities on different factors may vary with time. Broadly the Pearl delta cities (1,2,3), Foshan and city 22 are generally ranked in the top two on most factors. The implication is that we should expect to see intra-Guangdong emigration from the remaining thirteen cities which are never ranked in the top two on any criterion for migration but inward immigration into the ranked cities.

^{10} In fact we drop one regressor to avoid multicollinearity, leaving 107 coefficients.

^{11}Since we have scaled the regressors to have zero mean and unit variance across the whole sample, the estimated coefficients are largely independent of the units in which we measure variables. So we do not compute distinct elasticities.

References

Bazzi S: Wealth heterogeneity, income shocks, and international migration: theory and evidence from Indonesia. 2012.

Charsley K, Liversage A: Transforming polygamy: migration, transnationalism and multiple marriages among Muslim minorities. In Global Networks. Edited by: Rogers A, Rogers A , Vertovec S, Cohen R. Blackwell Publishing Ltd& Global networks partnership, UK; 2011.

Chen Y, Rosenthal Stuart S: Local amenties and life-cycle migration: Do people move for jobs or fun?J Urban Econ 2008, 64: 519–537. 10.1016/j.jue.2008.05.005

Docquier F, Rapoport H: The economic of migrants’ remittances. In Handbook on the economic giving, altruism and reciprocity. Edited by: Kolm SC, Ythier JM. Elsevier, North-Holland. Ch. 17.,; 2005:1135–1198. [online] Available at: http://www.sciencedirect.com/science/article/pii/S1574071406020173

Dustmann C, Frattini T, Glitz A: The impact of migration:a review of the economic evidence, final report No.102. UCL: Centre for Research and Analysis of Migration (CReAM). 2007.

Fan, C: Migration in a socialist transitional economy: heterogeneity, socioeconomic and spatial characteristics of migrants in China and Guangdong Province.Int Migr Rev 1999,33(4):954–987. 10.2307/2547359

Furtado D, Theodoropoulos N: I’ll marry you if you get me a job: cross-nativity marriage and immigrant employment rates, CReAM discussion Paper. CDP No 01/08. [online] Available at: http://www.cream-migration.org/publ\_uploads/CDP_01\_08.pdf. 2008.

Greenwood MJ, Hunt GL, Mcdowell JM: Migration and employment change: empirical evidence on the spatial and temporary dimensions of the linkage.J Reg Sci 1986,V(26):223–234.

Heckman JJ, Matzkin R, Nesheim L: Identification and estimation of hedonic models: the vector nonseparable case with missing attributes. Univ. Chicago, Dept, Econ, Chicago; 2002.

Pesaran H: Estimation & Inference in large heterogeneous panels with multifactor error structure.Econometrica 2006,74(4):967–1012. 10.1111/j.1468-0262.2006.00692.x

Seeborg MC, Jin Z, Zhu Y: The new rural-urban labor mobility in China: causes and implications.J Socio-Econ 2000, 29: 39–56. 10.1016/S1053-5357(00)00052-4

Vermeulen W, Ommeren JV: Does land use planning shape regional economics? A simultaneous analysis of housing supply, internal migration and local employment growth in the Netherlands.J Housing Econ 2009, 18: 294–310. 10.1016/j.jhe.2009.09.002

The IZA Journal of Migration is committed to the IZA Guiding Principles of Research Integrity. The authors declare that they have observed these principles.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Simmons, P., Xie, Y. Where is the grass greener? A micro-founded model of migration with application to Guangdong.
IZA J Migration2, 7 (2013). https://doi.org/10.1186/2193-9039-2-7