### Measurement of ethnic network

In order to explore the effects of ethnic networks, previous empirical studies have adopted either ethnic concentration (e.g. Aguilera [2009]; Damm [2009]; Edin et al. [2003]; Toussaint-Comeau [2008]; Borjas [1995]; Andersson and Hammarstedt [2011]) or linguistic concentration (e.g. Bertrand et al. [2000]) as the proxy for an ethnic network. Unlike these studies, we adopt the "spatial approach" to account for ethnic networks and concentration in order to capture the effects of social and resource networks for immigrant groups. We added a weighted ethnic spatial lag variable and compare this to the conventional model.

#### Weighted ethnic spatial lag

We constructed an ethnic spatial network variable – `weighted ethnic spatial lag' as the proxy of immigrants' ethnic network to represent the individual's network of economic resources, in addition to ethnic concentration. By doing so, we are able to separate the network-specific resource effect from the more general ethnic concentration. We hypothesize that both ethnic networks and ethnic concentration influence immigrants' self-employment decisions.

*W* is a *n* × *n* ethnic spatial weighted matrix, which shows the first-order ethnic and geographical (ethnic-spatial) relationship among individuals. Before the discussion of *W*, the first-order ethnic spatial neighbourhood matrix *E* will be introduced by an example. Suppose individuals P1, P2, P4 and P6 are all from Asia; P1 and P4 are all located in region A, while individuals P2 and P6 are located in region B. P3, P5 and P7 are from Europe, all of them are located in region B. Thus, the 7 × 7 first-order ethnic-spatial neighbourhood matrix *E* is:

E=\left(\begin{array}{ccccccc}\hfill P1\hfill & \hfill P2\hfill & \hfill P3\hfill & \hfill P4\hfill & \hfill P5\hfill & \hfill P6\hfill & \hfill P7\hfill \\ \hfill P1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill P2\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill \\ \hfill P3\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill 1\hfill \\ \hfill P4\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill P5\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill \\ \hfill P6\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill P7\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill \end{array}\right)

(1)

When the elements of matrix *E* are zeros, individuals are not deemed to be first order ethnic-spatial neighbours. In addition, the diagonal elements of the above matrix are zeros which means individuals are not considered as neighbours to themselves.

In order to define a "weighted ethnic spatial lag", the ethnic spatial matrix *E* should be normalised by unifying the row sums, such that we can form the ethnic spatial weighted matrix *W*:

W=\left(\begin{array}{ccccccc}\hfill P1\hfill & \hfill P2\hfill & \hfill P3\hfill & \hfill P4\hfill & \hfill P5\hfill & \hfill P6\hfill & \hfill P7\hfill \\ \hfill P1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill P2\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill \\ \hfill P3\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1/2\hfill & \hfill 0\hfill & \hfill 1/2\hfill \\ \hfill P4\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill P5\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1/2\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1/2\hfill \\ \hfill P6\hfill & \hfill 0\hfill & \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill P7\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 1/2\hfill & \hfill 0\hfill & \hfill 1/2\hfill & \hfill 0\hfill & \hfill 0\hfill \end{array}\right)

(2)

In our study, the "ethnic-spatial weighted matrix" is considered to be dynamic. Our matrix is constructed with micro data, based on the three conditions of^{1}: 1) ethnic group, 2) region of residence, and 3) year of survey. Therefore, if immigrants were to shift location, the demographic composition of regions would alter (for example, the number of Chinese immigrants in a specific region would change). Our ethnic spatially weighted matrix, W, captures this dynamic aspect. In addition, since we derive W by normalizing the ethnic-spatial neighbourhood matrix, `E', through unifying the row sums, the elements of W always fall in the range of 0 and 1. As such, the row sum reflects for a typical immigrant, how many immigrants (from the same ethnic group) are living in the same place with that immigrant. Therefore, when the number of immigrants from a typical ethnic group in a specific region changes, the relative elements of W will be changed as well.

We are interested in better-understanding the migrants' network effect through the data. The spatial model provides a new and more relevant theoretical and empirical framework to investigate the effect of ethnic capital.

#### Ethnic spatial model

The data generating process for the situation when the value of one observation *i* depends on the value of its neighbour *j*'s observation (e.g. LeSage and Pace [2009]) is as below:

{y}_{i}={\alpha}_{i}{y}_{j}+\mathit{\beta}{X}_{i}+{\epsilon}_{i}

(3)

{y}_{j}={\alpha}_{j}{y}_{i}+\mathit{\beta}{X}_{j}+{\epsilon}_{j}

(4)

{\epsilon}_{i}~N\left(0,\phantom{\rule{0.25em}{0ex}}{\sigma}^{2}\right),\phantom{\rule{0.5em}{0ex}}{\epsilon}_{j}~N\left(0,\phantom{\rule{0.25em}{0ex}}{\sigma}^{2}\right)

Thus, equations (3) and (4) imply a "simultaneous data generating process" that shows the dependence of *y*_{
i
} and *y*_{
j
} and vice versa. This analytical feature leads us to a data generating process which is an "Ethnic Spatial Autoregressive Process", and the following expression:

{y}_{i}=\rho {\displaystyle \sum _{j=1}^{n}}{W}_{\mathit{ij}}{y}_{j}+\beta {X}_{i}+{\epsilon}_{i}

(5)

{\epsilon}_{i}~N\left(0,{\sigma}^{2}\right)\phantom{\rule{3.25em}{0ex}}i=1,\dots ,n

where *X*_{
i
} is a vector of socio-economic variables for individual *i*. In our analysis, *y*_{
i
} and *y*_{
j
} represent self-employment choices by individuals *i* and *j.* The "ethnic neighbour" is defined as individuals who are from the same ethnic group and in the same location. Thus, {\displaystyle \sum _{j=1}^{n}}{W}_{\mathit{ij}}{y}_{j} is the "weighted ethnic spatial lag" in this context, and it represents the linear combination of individual *i*'s ethnic neighbour's self-employment choices.

As a result, the matrix version of equation (5) is:

y=\mathit{\rho Wy}+\mathit{\beta X}+\epsilon

(6)

\epsilon ~N\left(0,\phantom{\rule{0.25em}{0ex}}{\sigma}^{2}{I}_{n}\right)

where *N*(0, *σ*^{2}*I*_{
n
}) represents the zero mean disturbances process with the constant variance *σ*^{2}. *I*_{
n
} is the n-dimensional identity matrix.

Under the ethnic capital hypothesis, individuals' incomes depend on ethnic capital and other socio-economic variables. In this setting, one can define individuals who are from the same ethnic group and location as first-order "ethnic neighbours". Thus, "weighted ethnic spatial lag" represents the case where an individual's labour market performance is influenced by its ethnic neighbours' labour market performance and other ethnic-capital factors in that location. Therefore, the matrix version of our model is:

y=\alpha {l}_{n}+\mathit{\rho Wy}+\mathit{\tau EC}+\mathit{\beta X}+\epsilon

(7)

where y is immigrants' economic performance (e.g. self-employment outcome), *X* is a vector of socio-economic variables; *Wy* is the weighted ethnic spatial lag vector which indicates the first-order ethnic-spatial relationship among individuals; *l*_{
n
} is a vector of `ones' and associated with the parameters *α* and *β*. Thus, the coefficient *ρ* indicates the size of the effect of the network in a specific region. Furthermore, we are also able to construct a full social networking variable for that individual via *W*; and it captures all the information of a network.

Ethnic concentration (EC) has been defined in various forms across studies. For example, Borjas ([1986]) argued that a Hispanic enclave definitely helped the Hispanic immigrant entrepreneurs in the United States due to the cultural and language similarities for three Hispanic groups (Mexicans, Cubans, and other Hispanics). He defined the ethnic concentration variable as the proportion of Hispanic population of the MSA's^{2} population in the United States.

In this paper, we adopted a similar approach to Borjas' ([1986]) with

E{C}_{\mathit{kl}}=\frac{\mathit{\text{Populatio}}{n}_{\mathit{kl}}}{\mathit{\text{Populatio}}{n}_{l}}

(8)

where "*k*" denotes ethnic group, and "*l*" represents a specific state or region^{3}.

Due to the hypothesis of ethnic enclave, we would expect that the coefficient of the immigrant population size in a specific region should be positive in most cases^{4}.

From rearranging equation (7) we derive:

\begin{array}{l}\left({I}_{n}-\mathit{\rho W}\right)y=\alpha {l}_{n}+\mathit{\tau EC}+\mathit{\beta X}+\epsilon \\ \phantom{\rule{0.75em}{0ex}}y={\left({I}_{n}-\mathit{\rho W}\right)}^{-1}\alpha {l}_{n}+{\left({I}_{n}-\mathit{\rho W}\right)}^{-1}\mathit{\tau EC}+{\left({I}_{n}-\mathit{\rho W}\right)}^{-1}\mathit{\beta X}+{\left({I}_{n}-\mathit{\rho W}\right)}^{-1}\epsilon \phantom{\rule{0.25em}{0ex}}\\ \epsilon ~N\left(0,{\sigma}^{2}{I}_{n}\right)\end{array}

(9)

In comparison, the previous basic econometric model for immigrants' labour market performance in matrix version is:

y=\alpha {l}_{n}+\mathit{\gamma X}+\epsilon

(10)

It is noteworthy that the coefficient *β* for the explanatory variables in equations (7), and (9), where we have included the "weighted ethnic spatial lag" in our model, is different from the coefficient *γ* in the conventional model (equation 10). Therefore when we take the network effect into account, all estimated coefficients need to be accordingly adjusted by the spatial dependence. As a result, the spatial model provides a better estimation of the effects of immigrants' personal characteristics and other socio-economic factors when the network effect is present, compared to the conventional model.

### Spatially autoregressive discrete choice model

Following from our hypotheses of ethnic capital, we investigate how ethnic network influences immigrants' self-employment decisions. The logit model is widely employed in testing such discrete choices, as it approaches the random utility assumption to the self-employment choices (to be self-employed or not). In this study, we have adopted similar settings for a binary weighted spatial lag model as those of Adjemian et al. ([2010]).

Immigrant *i* chooses a form of employment (either to be self-employed (*S.E.*) or employed in the wage/salary (*W.S.*) sector) which will maximise his/her utility. For self-employment choice (*S.E.*) the utility for a recent male immigrant is given by:

{U}_{i}^{S.E.}={V}_{i}^{S.E.}+{\epsilon}_{i}^{S.E\text{.}}

(11)

where {V}_{i}^{S.E\text{.}} shows the deterministic portion of utility, {\epsilon}_{i}^{S.E\text{.}} represents a random component. Then, the deterministic utility is composed of a set of explanatory variables and a weighted ethnic spatial lag (which represent social network effect):

{V}_{i}^{S.E.}={\beta}^{\prime}{x}_{i}+{\rho}^{\prime}\mathit{Wf}\left({V}_{i}^{S.E\text{.}}\right)

(12)

where *x*_{
i
} is a set of social-economic variables of immigrant *i,* such as educational attainment, years since migration, age, and other demographic variables and local characteristics; and *W* is the spatial weight matrix which indicates the first order ethnic neighbourhood for every immigrant. As a result, the coefficient of *W* indicates the correlation of utility from choosing self-employment for all immigrants who are members of a particular ethnic network.

The immigrant makes a decision regarding which sector to be employed in: the self-employment (*S.E.*) sector or wage/salary (*W.S.*) sector. As a result, the decision rule for immigrant *i* is expressed as:

\begin{array}{l}Pr\left[y=S.E.\right]=Pr\left[{U}_{i}^{S.E.}>{U}_{i}^{W.S.}\right]\phantom{\rule{3.25em}{0ex}}\\ =Pr\left[{U}_{i}^{W.S.}?{U}_{i}^{S.E.}<0\right]\\ =Pr\left[{V}_{i}^{W.S.}+{?}_{i}^{W.S.}?{V}_{i}^{S.E.}?{?}_{i}^{S.E.}<0\right]\\ ={\displaystyle ?}I\left({?}_{i}^{W.S.}?{?}_{i}^{S.E.}<{V}_{i}^{S.E.}?{V}_{i}^{W.S.}\right)f\left({?}_{i}^{W.S.}\right)d{?}_{i}^{W.S\text{.}}\end{array}

(13)

where the indicator function *I* takes the value of one if the expression in parentheses is true, otherwise it is equal to zero. In addition, the independent random error assumption is held, and *ε* is identically Bernoulli distributed for all immigrants (see Adjemian et al. [2010]). Therefore, the probability of immigrant *i* deciding to choose self-employment *S.E.* is given by the logistic probability:

{P}_{i}^{S.E.}=\frac{1}{1+exp\left({V}_{i}^{S.E\text{.}}\right)}

(14)

In previous studies of spatial discrete choice models the network effect is treated as a signal or a kind of knowledge (see Goetzke, [2008]; and Adjemian, et al, [2010]), which means the spatial spill-over could be unidirectional but not multidirectional (e.g. Adjemian et al. [2010]). Goetzke ([2008]) made the assumption following Anselin ([2002]) on transport choice modes that, "The model is conditional upon the observed neighbouring mode choices, which means that the spatial spill-over process is not modelled as an endogenous process. The advantage is that the estimation of this model type is straightforward to estimate". For example, in this setting, once individual *a* has made a choice, individual *b* will learn this information and possesses this knowledge as one of the factors assisting him/her to make a decision. However, individual *b*'s decision cannot go back to influence individual *a*'s decision in the same round. Adjemian et al. ([2010]) also treated the weighted spatial lag variable as exogenous due to the nature of car purchases, and transactions costs which constrain a household's car purchases to be fixed in the short term. They note that, "As a result, spatial spill-overs in auto choices are necessarily unidirectional". One could apply a similar logic to immigrant self-employment since moving across localities and engagement in self-employment takes time, and there are significant transaction costs.

However, the potential endogeneity of the weighted ethnic spatial lag variable in general (e.g. Goetzke and Andrade [2009]), and for immigrant self-employment due to either multi-lateral (as opposed to unilateral) network effects, or the existence of potential unobservable variables that correlate with the spatial lag variable cannot be ruled out. Therefore, in this paper, we relax this assumption, such that the spatial spill-over could also be multidirectional rather than unidirectional^{5}. We adopt the modelling approach that controls for potential endogeneity of the weighted ethnic spatial lag term (Anselin [1990]; Kelejian and Prucha [1998]; and Kelejian and Robinson [1993]), based on a Spatial Two Staged Least Squares. Hence, the weighted ethnic spatial lag term (ethnic network in this study) is treated as an endogenous variable^{6}.

Anselin ([1990]), Kelejian and Prucha ([1998]), and Kelejian and Robinson ([1993]) proposed a Spatial Two-Stage Least Squares (2SLS) approach to control for the endogeneity of spatial lag. As Kelejian and Robinson ([1993]) have illustrated, all exogenous variables *X*, their spatial lag *WX* and higher spatial lags (e.g. *W*^{2}*X*,*W*^{3}*X*, …, *W*^{n}*X* ) work jointly as a set of instruments for the endogenous spatial lag *Wy*. In this study, as noted by Anselin ([1999]), according to the computational complexity of using the full set of instruments for the endogenous spatial lag, we have selected the first order spatial lag of all exogenous variables *WX* as well as *X* as the set of instruments. We find that this specification results in significantly better performance than the model that assumes the exogeneity of the weighted ethnic spatial lag variable, *Wy*.

#### Quality and strength of ethnic network

The effect of the quality (strength) of networks on immigrants' self-employment decisions is less examined.

Earlier advances in conventional models such as Bertrand et al. ([2000]), and Edin et al. ([2003]) measured ethnic capital as the interaction between ethnic group, neighbourhood and year. Among the studies on self-employment of immigrants, Sousa ([2013]) measured the quality of the local community based on human capital; and Toussaint-Comeau ([2008]) measured the quality of the network by "the average relative self-employment rate of the group in the U.S.".

As noted earlier, in this paper we propose a spatial approach that captures the quality and strength of ethnic group self-employment choices. The `weighted ethnic spatial lag' (ethnic network variable) does not simply reflect the interaction of ethnic group, neighbourhood and year, or average resources. Instead, in this approach the model includes the influence from self-employment decisions typically made by other members of the ethnic network in a locality and during a specific year. We believe that this approach more closely shows the correlation of individuals' decisions and also its effect on the decisions of the other group members. In addition, in the approach adopted we view the data as *i.i.d*. This is fundamentally different from usual linear models. As we show in our empirical results (Section 5), spatial models are preferred statistically in every case.

### Review of the literature (other variables of interest)

Economists and sociologists have observed about six key determinants of immigrant entrepreneurship (refer to Le [2000]; Evans [1989]; Kidd [1993]): educational attainment, labour market experience, economic requirements, marital status, industry and occupation factors, and the host country's language and ethnicity factors.

In the literature on self-employment, educational attainment is noted to have a significant influence resulting from two opposing forces (e.g. Le [2000]). On the one hand, educational attainment reflects the ability of the individual, in particular, his or her managerial ability, to operate a business. On the other hand, individuals with higher educational attainment are less likely to be self-employed since education enhances the propensity for a person to find employment in the waged sector. Therefore, the dominant impact is an empirical question.

Experience is argued to be either a "stock" (Evans [1989]) or "flow" (Kidd [1993]). In addition, labour market experience can be viewed as the accumulation of skills and market information. With greater experience, an individual will be more confident about operating a business. Secondly, age increases at the same time as an individual's labour market experience increases. With the increase in age, personal learning capacity and the present value of future returns diminishes, so increasing age also decreases the propensity for self-employment.

Previous studies (e.g. Bernhardt [1994]; Kidd [1993]) have paid attention to the importance of economic requirements for entrepreneurship decision. For example, Kidd ([1993]) used age as a proxy for financial capital and adopted a binary variable, "rent", to study immigrants' propensity for self-employment. Kidd concluded that those who own their residence are more likely to select self-employment than those who rent a house.

Marital status is an indicator of stability, which thus provides implications for and background to a risky self-employed status. Borjas ([1986]) noted that married individuals are more likely to choose self-employment because married couples may like to "put up" or join financial resources to run their business. In addition, given family support, it would also reduce the unwillingness to take risks that an individual might face. As a result, marriage makes self-employment more feasible for an individual.

Since the first wave of the data was conducted six months after new immigrants settled in New Zealand, the variables such as proficiency in English, children, marriage, skill level, overseas self-employment experience and own dwelling in our model are treated as exogenous variables, as by design we incorporate only the first wave's data for those variables.

It is also hypothesized that self-employment is partially affected by occupational status. According to the Middleman Minority Theory, the employment status of an individual is decided by the work undertaken (Bonacich and Modell [1980]). Current employment provides work experience and training for potential entrepreneurs before they set up their own business. This is also a complementary explanation for the impact of experience, as more information about the market, business networks and business skills will be acquired during that period. Evans ([1989]) observed that individuals with a high occupational status are more likely to choose self-employment. More specifically, Le ([2000]) claimed that trade, sales, and managerial occupations require more relevant knowledge, and they also make self-employment more feasible.

The effect of skill in the host country's language is significant and unambiguous. Host country's language proficiency (e.g. English) reflects the immigrant's integration into the general community. However, the effect of English-language skills in relation to self-employment is ambiguous, and it may vary by country, data, and cohort. On the one hand, a lack of skill in the host country's language will hinder business communication with the native mainstream economy (e.g. Le [2000]). On the other hand, a lack of English proficiency can increase the propensity for self-employment by satisfying the demand from other immigrants from the same ethnic group (e.g. Evans [1989]). In addition, a third point of view is based on Disadvantage Theory (Light [1979]): communication disadvantages make it difficult for immigrants to be employed in the wage sector; however, the same disadvantages encourage them to be self-employed.

Previous New Zealand studies (e.g. Poot [1998]; Maré and Stillman [2009]) have analysed the effect of human capital and personal characteristics on immigrants' labour market performance. However, the effect of ethnic capital (e.g. ethnic network and ethnic concentration) on immigrants' economic performance (especially self-employment outcome) in this context remains unknown.

In this paper we account for these factors in addition to our new network variable of interest.