The data analyzed comes from the SIP-ENTRY database. SIP-ENTRY contains longitudinal data on the entire Swedish population born between 1973 and 1995, as well as their parents and siblings born outside the main sampling window, followed until 2012. Information on attained primary school grades and test scores is available from 1989 to 2011 and therefore constitutes the key period of interest. SIP-ENTRY also has information on demographic characteristics, including date of migration, of importance to accurately define an individual’s number of years since migration. Through the addition of the multigenerational register, family IDs have been created via linking parents to their biological children. This represents a particular strength of this study as a means of isolating the effects of incorporation and dealing with unobserved individual heterogeneity (Lawlor et al. 2009). Our study sample has been designed to consist of second-generation immigrants, for whom the birthplace and date of immigration on both biological parents are known. Furthermore, the individual must have a reported final course grade in Swedish and math, as well as on national tests for the same subjects. Lastly, this information must also be available for at least one other sibling, in order for family fixed-effects models to be estimated, as explained in more detail in Section 3.2.
3.1 Measures
3.1.1 Education performance
The educational performance considered in this paper includes the individual’s teacher-assigned course grades as well as scores on national standardized tests in Swedish and math during the final year of primary school, at age 16. These classes are important, since, in order to pass primary school, students have to earn a passing grade. The course performance forms a basis of whether the student is admitted into college preparatory academic programs or vocational programs at the high school level. Data for teacher-assigned grades covers the entire time frame (1989–2011), but changes in grading between the 1997 and 1998 graduating cohort (Wikström 2005) lead to different grade distributions. As a result, we include a dummy variable for pre- and post-1998 years in the analysis on teacher-assigned grades. We also consider standardized test performance, covering the period between 2005 and 2011. Arguably, the tests represents a more pure indicator of the student’s knowledge, as it is well known that course grades may be affected by various unobserved factors, including classroom behavior and teacher differences in grading routines. Instead, the standardized tests are taken by all ninth grade students at the same time, with the scoring being performed by the teacher, however, following common and strict guidelines. Thus, standardized tests have been developed to be comparable at the national level, also providing teachers with a benchmarking tool to help them assess their students’ mastery. The grade categories for both instruments include “Fail,” “Passed,” “Passed with distinction,” and “Passed with highest distinction.” In the analyses, these remain categorical variables.
3.1.2 Parents’ country of origin
In defining the study population, we use information on the individual’s and their parents’ country of birth. Using this information, we are able to identify individuals belonging to the second-generation immigrant, being the focus of the empirical analysis. The sample is limited to individuals who are born in Sweden and on whom we are able to identify both parents as being born abroad, defined as belonging to the second generation. Consequently, we exclude individuals for whom we only have information for one of the parents.
Linked to the grouping of countries of origin in SIP-ENTRY, the analysis will examine individuals by the following country of origin groups: Africa, East Africa (Ethiopia, Somalia, and Eritrea), Iraq, Iran, Lebanon, Turkey, Thailand, Vietnam, Asia/Oceania, Chile, South America, non-EU-27 and Czech/Slovakia, former Yugoslavia, former Soviet Union and Poland, North America and EU-27, and Nordic (Finland, Denmark, and Norway) countries. Albeit somewhat subjective, the list of country groups displays their respective degree of socio-cultural similarity, in ascending order. Following Jonsson and Rudolphi (2011), among individuals whose parents come from two separate countries, they are assigned the “closest” of the two regions of origin.
3.1.3 Years since migration
Information on migration contains the exact date of immigration to Sweden starting in 1973. For those that arrived before 1973, we therefore rely on the 1960, 1965, and 1970 censuses to assess if they were present in Sweden before 1960, between 1960 and 1965, or between 1965 and 1970. When constructing age at migration, these individuals are ascribed the years 1959, 1962, and 1967, respectively. The PYSM is hence derived as the number of years spent in Sweden at the year of the examined individual’s birth. We use the value of the parent from the culturally closest country with the greatest YSM, as they represent the most potential accumulated incorporation experiences.
3.2 Methods
The focus of the analyses is towards estimating the impact of parents’ years spent in Sweden on the children’s educational performance in two core subjects within Swedish primary school: Swedish and math. To this end, we begin by descriptively comparing the sample characteristics in terms of class grades, test scores, and PYSM. These are displayed by parents’ country of origin and, for PYSM, include the within-family variation. Next, we estimate a series of multinomial logistic regression models on grade and test performance with family fixed-effects (Pforr 2014).
The family fixed-effects approach is well suited to avoid some of the sources of bias which may arise from the PYSM being correlated with the error term. More specifically, it is difficult to assess the causal effect of parents’ time spent in the country of destination using cross-sectional (as well as longitudinal) data, since those who have a longer duration of stay may be fundamentally different from those having stayed for a shorter period of time (Chiswick and Miller 1995). Those who wait longer to have a child after migration might be doing so due to their preferences for their children’s education, thus not necessarily independent of their incorporation experiences. Furthermore, educational outcomes are partly determined by ability, representing another major potential source of omitted variable bias. The family fixed-effects provides a means to control for the influence of all time invariant characteristics, such as genetic traits (50 % shared between siblings) and preferences towards their children’s education, thus removing important potential sources of bias. The drawback of this approach is it necessitates that more than one child is born in Sweden, introducing the question of external validity vis à vis families that only have singletons. If a younger sibling has an older sibling who has gone through school, it means there are others in the household who are fluent in Swedish and are familiar with the topics that will be covered in these classes; something a singleton does not have access to. Despite this drawback, the proportion of immigrant families in our sample with two or more children born in Sweden is higher than 80 %, so this study captures the lived experience for the overwhelming majority of second-generation immigrants.
The empirical models are estimated by means of multinomial logistic regression with family fixed-effects. The specification follows equation (1), below, where P(y
if
= k) describes the probability to be in state k out of the possible states 0,…, m for individual i in family f. Here, these states range from “Fail” to “Pass with very special distinction,” translating to m = 4. The propensity function is modeled as a function of a vector of control variables, X
if
, including whether the individual is the oldest sibling and their sex. The key parameter is represented by θ
k
, estimated based on the PYSM (Z) when individual i, in family f, was born, estimated separately for each of the k outcomes. The identification of all parameters relies on within-family variation in both independent and dependent variables, while sibling combinations characterized by the same grade/test score will cancel out. Those individuals who are excluded due to the lack of within-family variation are compared to those who are included in Additional file 1: Appendixes C and D. This approach provides a way to control for not only observed characteristics but also unobserved characteristics, in which everything shared between siblings that could otherwise bias the estimates will be neutralized. More specifically, this is accomplished through the parameter μ
kf
, representing the family fixed-effects, capturing time invariant characteristics common to all siblings within a given family. Lastly, ε
kif
is an individual-specific error term.
$$ P\left({y}_{if}=k\right)={\beta}_k{X}_{kif}+{\theta}_k{Z}_{kif}+{\mu}_{kf}+{\varepsilon}_{kif} $$
(1)
The analysis is performed first on the aggregated non-Western and Western parents’ country of origin groups to provide a general sense of the relationship between PYSM and children’s academic performance. The same analyses are then conducted separately for each parents’ country of origin group. This is linked to the lack of within-family variation on this measure, and also this allows us to estimate the effect of PYSM separately by parents’ country of origin. The main disadvantage is that comparing coefficients between the groups becomes problematic. Therefore, we will primarily discuss differences between countries of origin in terms of the direction of the effect of parents’ time in Sweden and only to a lesser extent compare the size of coefficients across models.