6.1 Empirical strategy
We estimate the impact of having a new migrant on household welfare in the following specification:
$$ {y}_{it}={\beta}_1{2015}_t+{\beta}_2{\mathrm{NewMig}}^{\ast }{2015}_{it}+{\beta}_3{X}_{it}+{\beta}_4{\mathrm{LM}}_{ct}+{H}_i+{\varepsilon}_{it} $$
(1)
Our interest is to see how the welfare of households changes when they have a new migrant. With two time periods, we regress the outcome variable yit for household i on the treatment status of a household, NewMigi, interacted with a dummy indicating the second survey year, 2015t. NewMigi is a dummy indicating whether the household has a new migrant or not. We also control for the general change of welfare over time by including the dummy for the second survey year separately. We include household fixed effects, γi, that automatically discard any unobservable characteristics of the households that do not vary between the survey waves.
The parameter of interest is β2, the coefficient of the interaction. It measures the effect of having a new migrant between the two survey waves on the welfare of the origin household compared to those households that did not see another member migrate.
The time-varying household characteristics, Xit, are the dependency ratio, whether the household has a returned migrant and the employment status of the household head (unemployed/unpaid work, self-employed, employed or inactive). These can all affect household welfare, and they can change within the period under investigation. If a household has another child or if one of the older members becomes too old to work, then welfare might decline, as per capita income declines. Similarly, if a household head becomes unemployed, this affects household welfare negatively. Finally, a migrant who returns to the origin household can, on the one hand, bring home money and invest it in assets to increase welfare or, on the other hand, the returnee might have failed at destination and now presents an additional burden to the household.
The local labour market variable, LMct, is the employment rate in a community c. It is measured as the share of individuals who work as wage employees relative to the local labour force. This is included because a household seeking to diversify its income sources will consider local opportunities, where household members could earn a wage.Footnote 5
We estimate the fixed effects model in a weighted least squares regression applying entropy balancing weights. These weights are used to make the control group look comparable to the treated households in terms of household characteristics at baseline, in 2013. This reduces the selection bias that can challenge the analysis of migration impacts.
6.2 Dependent variable: housing quality index
The outcome variable is an index of housing quality constructed using multiple correspondence analysis (MCA) and includes the number of rooms, dwelling ownership, the presence of a bathroom and a toilet, main source of drinking water and the floor and wall material. Additional file 1: Table S1 provides a detailed overview of asset ownership in the sample.
The empirical literature suggests that households with migrants often use remittances to improve their housing (Kagochi and Kiambigi 2012, Osili 2004, Durand et al. 1996). However, these studies do not differentiate between first and successive migrants. It is possible that households who already have migrant members might have already used their remittances to improve their housing and consequently do not require more investments. This is unlikely to be the case in rural Ghana: comparing households in our sample at baseline, households which have already received remittance from first migrants, we see that they still have low levels of basic amenities, e.g. toilets or potable water. These levels are comparable to those seen among rural households in the lower consumption quintiles according the 2013 Ghana Living Standards Survey (see Additional file 1: Table S1. Hence, the data suggest that our household sample could substantially improve their housing through investment financed through remittances.
Figure 2 presents the housing quality index in 2013 of households with a new migrant and of those without, and Fig. 3 depicts the same for 2015.
These figures illustrate that the distributions of the housing quality index overlap in 2013, but they shift apart in 2015. It seems that households without a new migrant have a higher distribution of the index.
6.3 Identification strategy
Several issues challenge the empirical identification of the impact of migration on households left behind.
Firstly, we can think of factors that simultaneously affect both the migration decision and the outcome. For example, risk aversion of a household might prevent it from engaging in migration or in more profitable but riskier technologies in their farm or business. Hence, such households would be less likely to have a new migrant and would remain at a lower welfare level. Such omitted variables would bias the coefficient of interest. In the given example, we would overestimate a negative effect of having a new migrant. We cannot foresee the direction of the effect, but it would be biased upwards. By modelling a fixed effect model, we capture any unobservable time-invariant factors at the household level.
Secondly, the migration decision could be influenced by the outcome variable. This is especially a problem with cross-sectional data (Antman 2012). The change in asset ownership in the period preceding our baseline could affect the treatment status of households. We cannot exploit previous data to control for this, but by balancing households on baseline characteristics, we only compare those that look similar and thus capture any effect the prior welfare change had on households.
We apply a weighting method that makes the comparison group look like the treated group in terms of observable characteristics at baseline. This approach assumes selection on observables. It means that conditional on observable characteristics, having a new migrant is as good as random (Wooldridge 2010). This balance is achieved for observable characteristics that are expected to influence the likelihood to be a treated household and the outcome variable (Imbens 2015). Once these observables are balanced, the selection bias is reduced (Heckman et al. 1998).
6.3.1 Entropy balancing weights
The weighting method applied is called entropy balancing developed by Hainmueller (2012). This approach defines weights for each observation that ensure a predefined balance of covariates. The balance can be defined in terms of the first, second and even higher order moments of observables. The main advantages of this method are that balance checks become redundant, the majority of observations are retained, the computation of the weights is fast and the method can be combined with many other matching and regression methods, similarly to inverse probability weighting methods and regression adjustment procedures (Imbens 2015).
Entropy weights, w, minimize the entropy distance metric, which is defined as:
$$ \underset{w_i}{\min }H(w)={\sum}_{i\mid D=0}{w}_i\log \left(\frac{w_i}{q_i}\right) $$
(3)
and which is subject to balance (Eq. 4) and normalizing constraints (Eqs. 5 and 6 respectively):
$$ {\sum}_{i\mid D=0}{w}_i{c}_{ri}\left({X}_i\right)={m}_r\kern1.5em \mathrm{with}\kern1.5em r\in 1,\dots, R\kern1.5em \mathrm{and} $$
(4)
$$ {\sum}_{i\mid D=0}{w}_i=1\kern5.75em \mathrm{and} $$
(5)
$$ {w}_i\ge 0\kern1.5em \mathrm{for}\ \mathrm{all}\kern1.25em i\kern3em \mathrm{such}\ \mathrm{that}\kern1.25em D=0 $$
(6)
qi is a base weight defined as 1 over the number of control units. cri(Xi) are ‘a set of R balance constraints [that are] imposed on the covariate moments of the reweighted control group’ (Hainmueller and Xu 2013, p. 4). Finally, it computes a set of weights that minimize the first Eq. (3) subject to the balance constraint, the normalization constraint and the non-negativity constraint. Once the weights have been computed, they are applied to estimate Eq. 1 with weighted least squares (WLS). This approach works like any regression adjustment method (Wooldridge 2010).
6.3.2 Variables to balance
We include all variables that we consider substantive for having a new migrant or for the outcome. We also include squared terms of continuous variables (Imbens and Rubin 2015; Smith and Todd 2005). Region dummies should capture any such factors that relate to migrant networks, regional development and economic opportunities. Most importantly, we control for the household size and dependency ratio of elderly and children to adult members to capture the household structure. These variables are important for the household decision about migration as well as the household’s welfare. Another important characteristic is the main household income source, that is, whether the household earns its living from agriculture, wage employment, its own business and public or private transfers. We also control for the employment status of the household head (employed, self-employed, unemployed or inactive) to capture economic activity. As a measure for human capital in the household, we include the highest level of education of adult members in the household. Many studies show that education is an important predictor for households’ welfare. It is also related to migration decisions as higher educated people have higher expected incomes at home as well as at possible destinations (Sjaastad 1962). We include a dummy for female household heads, shown to be a strong predictor for household welfare in the rural context as well as reflecting households’ options for migration decisions (Adams and Cuecuecha 2013). In addition, age and marital status of the household head are added to control for the life cycle of a household (Lipton 1980). Ethnicity was found to be an important factor in creating and maintaining migrant networks in Ghana (Awumbila et al. 2016). Such networks are important determinants for migration decisions as they reduce the risk and costs associated with migration (Carrington et al. 1996), which is why we include the ethnicity of the household head. We also include our measure of community employment rate. We choose this measure, because if a household seeks to diversify its income sources, it will also consider other opportunities in the community where household members could earn a wage (Bazzi 2017).
In a credit constraint context, only households at a certain level of wealth are able to afford migration (McKenzie and Rapoport 2007). Thus, only households with a similar level and distribution of welfare should be compared. While we do not have information on economic welfare pre-dating our baseline as suggested by Smith and Todd (2005), we include a rich set of asset indicators and information on asset purchases in the computation of balancing weights. These are the components used to construct the housing quality index and dummies that are equal to 1 if a household has purchased a specific asset within the past 5 years before the baseline survey, 0 otherwise.Footnote 6 In this way, we can capture a certain level of wealth and investment behaviour of the household that pre-dates the baseline.
In Additional file 1: Table S2, we show the mean and variance of the variables that were included in the construction of the entropy balancing weights with the weights applied to the control group. Using the weights leads to identical means and variances of all variables. The entropy balancing weights construct a comparable sample of households to reduce the selection bias.