The primary concern of this paper is to model the determinants of a mismatch between the actual education and the one formally required for the occupation (i.e., over-education and under-education respectively) among immigrants in the Australian labour market. Given the fact that the mismatch is observed only for the employed individuals, an exclusive focus on those immigrants who have an occupation may overlook the fact that they might constitute a non-randomly selected sub-sample (see, for instance, Dolton and Vignoles [2000]). Bauer ([2002]) and Cutillo and Di Pietro ([2006]) argue that the presence of possible heterogeneity of ability in the population could have a significant impact on the labour market outcome and consequently the extent of over- and under-education in the employed subsample. Given Australia’s different visa regimes which range from high skilled immigrants to refugees and those who entered on family visa, the immigrant sample is likely to be quite heterogeneous in ability and home country experiences.
Only about 68.6 percent of male immigrants in the potential labour force had employment at five months after immigration and 80.5 percent one year later. Taking into consideration the fact that the two possible types of mismatch (i.e., over-education and under-education) are observed only if the individual is employed, we apply a binomial probit model in order to correct for eventual sample selection bias. This approach follows Green et al. ([2007]), who use the same database and identification variables.
The occurrence of the mismatch j – which stands for either over- or under-education – may be illustrated by the following two linear latent dependent variable equations:
(1)
where if the individual has attained the respective mismatch and if not
(2)
where if the individual is employed and if not
The dichotomous variable y1ij
is only observed if . The model was first presented by Van De Ven and Van Praag ([1981]) to examine deductibles in private health insurance in the Netherlands. Variants of the model have then been used, for example, by Boyes et al. ([1989]) for analysing the default on loans while taking into account whether an application for a loan was accepted or not and Litchfield and Reilly ([2009]) to investigate whether an individual has attempted to migrate conditional on having considered migrating.
Equation (2) is fully observed and can be estimated separately. However, separate estimation of mismatch attainment (Eqn. 1) may be subject to selection bias given the potential for correlation between the two error terms u
i
and v
i. The model can be estimated stepwise (i.e., the inverse Mill’s ratio of the selection equation is introduced as a covariate in the outcome probit equation) or by maximum likelihood. Relative to the maximum likelihood approach, the two-step method is often perceived to give inconsistent results, in particular in the case when there is strong multicollinearity between covariates in the outcome and the selection equations (e.g., when using a joint set of covariates; see Lahiri and Song [2000]).
For each type of mismatch, the log-likelihood function to be evaluated is:
(3)
where ρ denotes the correlation coefficient between the error terms u
i
and v
i
; denotes the bivariate standard normal cumulative distribution function; and the univariate standard normal cumulative distribution function. The parameters of Eqns. (1) and (2) are estimated jointly by maximizing the log-likelihood function (3) with respect to the coefficient vectors β and γ and the correlation coefficient ρ. The estimate of ρ provides a test for selectivity bias. If ρ is significantly different from zero, the coefficients of Eqn. (1) would have been biased if estimated separately by binomial probit.
The identification of such selectivity models is of crucial importance. Identification is achieved by the inclusion of variables in Eqn. (2) that are excluded from Eqn. (1). Poor identification restrictions can lead to erroneous conclusions regarding the presence of selectivity effects. In the context of our application it would be of some interest to establish if, having controlled for a set of observable characteristics, the employed respondents possessed unobservable characteristics (e.g., motivation, cognitive abilities, etc.) that were in some way different from the whole sample. A statistically significant ρ value may provide an insight into this particular issue. However, confidence in the reliability of such a result depends crucially on appropriate identification. There is a set of variables that appear in but not in as well as a set that is common to both vectors. In addition, there are variables that appear in but not in , though these are not crucial for identification.
Following the empirical study of Green et al. ([2007]) the covariates chosen to identify the model (i.e., variables appear in but not ) are: car ownership, the household structure, a control for whether the immigrant visited Australia prior to immigration, a variable indicating whether the immigrant had own funds at the time of arrival and English proficiency.There are both theoretical and empirical reasons for these identifying restrictions. As shown in previous studies, owning a motor vehicle might increase the area where the individual can take up a job and, thus, the employment opportunities (see Raphael and Stoll, [2000]; Green et al., [2007]). Theoretically, however, there is no relation between car ownership and labour market experience and/or abilities. Education-occupation mismatch shall, therefore, not be dependent on car ownership.
The family structure may affect the probability of employment as well. For instance, the presence of other adults in the household might ease the pressure of taking up employment. On the other side, immigrant men with dependent children (i.e., at or below school age) present might be under greater pressure of taking up employment (see Lundberg and Rose, [2002]). Moreover, Green et al. ([2007]) argue that immigrants who have visited Australia prior to settlement are likely to have better knowledge of the Australian labour market or have already established contacts with Australian employers. Hence, we control for both the effect of the number of adults, the number of dependent children in the household and the knowledge of the Australian labour market through prior visits on the probability of being in employment. None of these variables should be correlated with labour market experience and/or abilities and, hence, education-occupation mismatch.
Two more identification instruments used by Green et al. ([2007]) are a control for having funds at the time of arrival and English language proficiency. They suggest that immigrants who face liquidity constraints might also be more likely to be under pressure to take up employment and the proficiency level of the host country language may have a positive effect on the probability of employment. Nevertheless, both savings and language proficiency could be correlated to innate abilities and, thus to education-occupation mismatch. Therefore, we test for the validity of these instruments as suggested by Murray ([2006]): the selection instruments are introduced as covariates in both the selection and the outcome equations (i.e., in both and ); the rejection of the null hypothesis that these additional instruments all have zero coefficients in the outcome equation (Eqn. 1) would support their validity as instruments.
The test results show that for both cases the null hypothesis cannot be rejected and, therefore, the instruments are valid. However, contrarily to Green et al. ([2007]), we find that the dummy controlling for having funds at time of immigration is not significant in the selection equation and is, hence, a weak instrument. For simplicity and given the fact that we have five valid instruments, we will not include the control for having funds at arrival in the estimation.
Our primary covariates of interest are a set of dummy variables included only in the outcome equation and control for the type of mismatch between the educational level and the occupational attainment in the last job held in the former home country in the 12 months prior to immigration (i.e., over-educated, correct match, under-educated). Having not worked during the last 12 months prior to immigration is the reference group for the dummy set. Moreover, immigrants enter Australia with formal experience gained from a large variety of labour markets. In order to capture differences in “quality” of previous labour market experience, we include in a second specification of our empirical model controls for the former home country as well.