3.1 Diversity index
The cultural diversity literature uses the fractionalisation index in measuring the impact of diversity on economic outcomes. Ethno-linguistic fractionalisation (ELF) is defined as the “probability that two randomly chosen citizens in a country belong to a different ethnic group where(in) group belonging is attributed by language” (Neumann and Graeff 2013). Vigdor (2008) uses a slightly different approach, the probability that a randomly selected individual is an immigrant, to estimate an assimilation index in the USA. Others have used the country of birth data to measure cultural diversity (Alesina et al. 2013; Bellini et al. 2013; Damelang and Haas 2012; Longhi 2013; Ottaviano and Peri 2006). Given the availability of data in the Australian context, this study uses country of birth/nationality instead of ethnicity/language.Footnote 5 Specifically, the proportion of the nationals of each country of origin (birth) in each LGA in Australia is used to compute a fractionalisation index. This index (hereafter diversity index) has a similar theoretical interpretation to the Herfindahl Index which is widely used in marketing research to measure the market/monopoly power of firms located in specific areas (Gomez-Mejia and Palich 1997) and is given by
$$ D{I}_{rt}=1-{\displaystyle \sum_{i=1}^I}{C}_{irt}^2\kern3em \forall\ i=1,2, \dots N,\ t=1,2,\dots T $$
(1)
where C
irt
represents the proportion of the nationals of country i in region (LGA) r in a given year t. The values fall in the range [0, 1] with “zero” indicating perfect homogeneity and “1” indicating perfect heterogeneity. We use the HILDA panel to construct the index of diversity. For the sake of visual comparison, we also estimated the indices for the 2001 and 2011 censuses (see Fig. 1). However, given the annual time series nature of our data, our main analyses is based on the indices constructed from HILDA.
These maps are constructed based on ABS census data. Although we had diversity data available for 676 LGAs, the 2011 Australian Standard Geographical Classification (ASGC, ABS 2011b) digital boundaries allow for only 560 LGAs upon which the maps reported in Fig. 1 are based. The first figure, Fig. 1a, is the distribution of the diversity index based on the 2001 census while Fig. 1b is based on the 2011 census. A comparison of the two figures indicates that cultural diversity increased in several regions over the decade. This is particularly visible in the metropolitan areas including Sydney and Melbourne (see Figs. 2 and 3, respectively). Both interregional mobility and international migration have contributed to this demographic change (Hugo and Harris 2011). Therefore, the analysis of diversity in this study accounts for both factors, by using a predicted instead of actual diversity index.
3.2 Share of migrants
The “share of migrants” is an alternative measure to assess whether the proportion of immigrants in a region per se has any effect on weekly wages. In addition, a diversity index is estimated for migrants excluding the Australian-born population. This is then included to see whether diversity among migrants (as opposed to diversity in general) contributes to labour market outcomes.
3.3 Weekly wages
The main dependent variable in this analysis is the log of weekly wages. Originally, HILDA respondents were asked a series of questions such as “For your [job/main job] what was the total gross amount of your most recent gross pay before tax or anything else was taken out?” Responses were recorded as “gross weekly wages and salaries” for the responding persons. For the complete panel, the mean weekly wage was $651.6 (SE = $254.1).
3.4 Other control variables
In addition to diversity (fractionalisation) and the share of migrants, standard demographic variables (age, age squared, gender, marital status) are included as control variables. English language fluency is also included, as the ability to speak English well is usually associated with labour market outcomes for migrants (Dustmann and Fabbri 2003). Foreign-born HILDA respondents were asked how well they spoke English with four response options ranging from “very well” to “not at all”. The third and fourth options (“not well” and “not at all”) are collapsed because those who responded with the fourth option were negligible (0.02 %).
3.5 Analytic framework
In analysing the HILDA data, this study aims to test the hypothesis that cultural diversity can have a positive impact on labour market outcomes by boosting regional economic growth. The labour market channel involves a dynamic interaction between employment and wages. However, in this study, the main focus is the impact of diversity on wages, taking into consideration regional variations. The effect of diversity on wages can be estimated via panel data analysis that accounts for individual and regional effects over different time periods. The first model estimated is a simple OLS model of the log of weekly wages (ln(w
irt
)) for each employed respondent aged 16–45 years.
$$ \ln \left({w}_{irt}\right)={\alpha}_{1i}+{\beta}_1{\mathrm{div}}_r+{\delta}_1{X}_{irt}+{\varepsilon}_{1irt} $$
(2)
where the main variable of interest is the diversity index div
r
. As suggested in the literature, further explanatory variables (X
irt
) are included such as weekly number of hours worked and job tenure as well as time indicator variables. In addition, age and its square, dummies for female, marital status, and region as well as English language skill and education indicators are included where appropriate.
Apart from the observable characteristics, there can be individual heterogeneity that can affect the relationship between diversity and labour market outcomes. Longhi (2013) shows that the positive wage effects of diversity reported in cross-sectional studies (Nathan 2011) can be explained by individual differences. A fixed effect (FE) model is therefore estimated in this study capturing the unobserved individual characteristics among HILDA respondents. All the explanatory time-variant variables included in OLS are also included in the FE models.
3.6 Endogeneity of cultural diversity
The impact of cultural diversity on an economy is confounded due to the possibility of reverse causality, whereby Eq. (2) results in a spurious correlation (Friedberg and Hunt 1995). Our purpose is to determine the effect of diversity on wages, but a two-way causality between diversity and wages is possible. While diversity can directly affect economic performance, it is also possible that people from diverse backgrounds can self-select to live in places with economic opportunities.
The impact of diversity on economic outcome can be positive or negative. On the positive side, diversity can augment economic performance as it can stimulate creativity and problem solving. Diversity can also boost economic growth by drawing labour from a pool of immigrants. On the negative side, it can deplete trust and social capital due to ethnic/racial fragmentation. This can in turn weaken economic performance. Whether the positive effects of diversity on economic performance outweigh the negative ones, at one level, is a simple empirical question. However, when economic outcomes directly or indirectly affect diversity rather than the reverse, there arises an econometric issue.
In this study, the issue of reverse causality arises when variations in regional weekly wages resulted in the concentration of people from diverse cultural backgrounds in specific regions. For example, in Australia, there is no restriction in the mobility of immigrants within the country, and potentially, immigrants can move to places with more perceived economic opportunities (see Hugo and Harris 2011). The HILDA data, for example, shows substantial internal migration across waves among the respondents. Therefore, reverse causality cannot be ruled out from a regression of economic performance on cultural diversity. Instead of diversity causing variation in regional labour market outcomes, the economic conditions such as prospects of employment may be driving the regional distribution of diversity. This poses an econometric issue, endogeneity, in estimating the causal effect of cultural diversity on employment outcomes. The effect of the explanatory variable, diversity, as measured by the share of foreign country citizens in a region (LGA) is confounded by the possibility of migrants’ concentration in response to economic incentives. Therefore, the coefficient of diversity cannot be consistently estimated due to correlation with the error term in the wage regression where the share of migrants is endogenous. This entails the violation of the Gauss-Markov (zero conditional mean) assumption in OLS (Wooldridge 2010).
A suitable procedure to correct the endogeneity problem is to apply instrumental variable (IV) estimation (see Baltagi 2008; Wooldridge 2010). The main challenge in applying this procedure is the identification of a valid instrument. If such an instrument can be found, the confounding, for example, between diversity and economic performance can be disentangled and causal relationship between these two variables established. In this study, the shift-share method is used to instrument for the index of cultural diversity and the share of migrants. Following Bellini et al. (2013), Ottaviano and Peri (2006), Longhi (2013), and, recently, Alesina et al. (2013), two-stage least squares (2SLS) estimation is applied to OLS and FE models.
3.7 Identification strategy
For an instrumental variable estimation to be specified for Eq. (2), two assumptions should be satisfied. First, the instrument chosen should be correlated with cultural diversity, the key explanatory variable, and second, it should not be correlated with economic performance. In addition, a correctly specified model should not omit relevant variables. Several instruments have been developed in the literature to solve the endogeneity issue in relation to cultural diversity. Altonji and Card (1991) use the 1970 immigrant stock in the USA while Hunt (1992) uses regional temperature and French repatriates of 1962 in a French-Algerian migration study. Ottaviano and Peri (2006) use distance from gateway cities in the USA while Longhi (2013) uses “the proportion of minorities joining the ‘New Deal Program’” in the UK. As detailed in the introductory section, mixed (positive and negative) results were obtained by these studies regarding the impact of diversity on economic outcomes.
An instrument suitable for the data used in this paper is the shift-share variable first utilised by Card (2001) in assessing the local labour market impact of immigrant flow in the USA. This instrument was later used in modelling the causal effect of cultural diversity on wages and rental prices for US cities (Ottaviano and Peri 2006) and European regions (Bellini et al. 2013). The shift-share analysis assumes that the regional migrant distribution can be used to generate an exogenous variable using two-time-period data. For example, the 2001 Australian census datasets have regional distribution of Australians based on their country of birth which along with ABS annual population estimates can be used to construct a measure of diversity.Footnote 6 The latter is composed of nationally aggregated distribution and has annual estimates by country of birth for the period 1992–2014. In this study, we use the period 2001–2011. These datasets offer two variables that are relevant here. One is the total number of population in each region by country of birth, and the other is the total number of population in each region. From these variables, it is possible to calculate the annual population growth in Australia by country of origin. Then these annualised estimates can be used along with the baseline (2001) regional population data to estimate the predicted population for each year up to 2011. Since these predicted figures are based on historical (year 2001) regional distribution rather than actual regional distribution, they are not confounded by population growth that could have resulted from economically driven mobility. Therefore, they are assumed to be exogenous to regional economic shocks.
The primary purpose is to estimate the predicted version of the share of migrants in Australia. First, the overall growth rate in the Australian population between time t (which is 2001) and time t + 1 is required. Formally, this rate g
i
is given by
$$ {g}_i=\frac{\left({p}_i^{t+1}-{p}_i^t\right)}{p_i^t} $$
(3)
where \( {p}_i^{t+1} \) and \( {p}_i^t \) represent the total number of the Australian population born in country i in the years t and t + 1, respectively. The next step is to generate the predicted number of Australian residents born in country i and residing in region r based on Eq. (3). This is given by the formula
$$ {p}_{ir}^{*t+1}={p}_{ir}^t\left(1+{g}_i\right) $$
(4)
where * indicates that the value is predicted for the year t. Summing this value (p
i
r
(* t + 1)) across all countries of birth provides the predicted total population for each region (LGA) in the next year.
$$ {P}_r^{*t+1}={\displaystyle \sum_i}{p}_{ir}^{*t+1} $$
(5)
where \( {P}_r^{*t+1} \) indicates the predicted total number of all residents in each region in t + 1. This value which differs from the actual population in that year is used to calculate the predicted diversity index (DI
st
as in Eq. (1)). Furthermore, this analysis is repeated to estimate the predicted migrant share in each region. The value is then used to calculate the predicted diversity index among migrants. Finally, the two instruments, namely the predicted diversity index and predicted share of migrants, are merged into the individual-level HILDA data based on the postal area variable.Footnote 7
Generally, the indices of diversity and the instruments generated using the 2001 census data and population estimates are correlated, satisfying the relevance criteria. However, the correlation coefficients are larger for diversity measure based on annual population estimates with r = 0.40 compared to diversity based on a 3-year lag where r = 0.30. On the other hand, the exogeneity criteria are also met, as the instruments are not correlated with the error term. The correlation coefficients between the residual and the two instruments (predicted diversity index and predicted share of migrants) are r = 0.06 and r = 0.06, respectively.