经常问的问题

分析希尔达调查数据

  • 我如何通过所有与希尔达提供的文档的浏览?

    希尔达调查用户手册 是开始,因为它回答了有关使用数据无数问题的最佳场所。

    该手册涵盖了各种主题,包括:

    • 丢失的数据公约
    • 衍生变量
    • 匹配波形数据文件,以创建纵向文件
    • 收入和支出的归集
    • 行业和职业变量
    • 澳大利亚和派生国际编码方案
    • 使用加权变量,并且
    • 数据质量问题。

    用户手册还提供了数据文档的概述和由希尔达调查中使用的设计和数据采集程序的摘要。

  • 什么文件的zip文件是什么?

    文档zip文件包含:

    • 编码的所有变量框架(PDF格式文件)
    • 编码框架的纵向权重和主文件
    • 的横波指数
    • 标记的问卷调查,以及
    • showcards(PDF格式文件)显示相关的变量名称不包括衍生和历史变量。

    文档zip文件还包含每个波的频率。字符串变量(ID和时间戳)通常从这些频率中排除。

    快速定位变量名中, 希尔达调查用户手册 should be used in conjunction with the cross-wave index, which is searchable by question number, keyword or variable name (excluding the first character wave identifier).

    横波指数表示各波特定变量的可用性,以及作为源问卷(或历史,对导出的变量)。

  • 什么是“自己的企业员工”和“雇主/个体户”之间的区别?

    希尔达调查一般采用的统计(ABS)澳大利亚统计局定义的标准劳动力市场变数。

    但是,我们不舒服的腹肌个体户的定义。

    腹肌 定义雇员

    a person who works for a public or private employer 和 receives remuneration in wages, salary, a retainer fee from their employer while working on a commission b如is, tips, piece-rates, or payment in kind; or a person who operates his or her own incorporated enterprise with or without hiring employees.

    In other words, their definition of employee includes owner managers who operate their own incorporated businesses (i.e., they are treated 如 “employees of their own business”).

    与此相反,谁经营自己的非公司的人被视为“自己的账户工作者”(即,它们是自雇人士)。

    We believe this distinction is misleading for many research purposes, so in our data rele如es we provide all of the necessary information for researchers to construct their own definition of employees 和 self-employed.

    如果您希望通过“雇员”的定义腹肌,你应该采取变量 _esempst 并结合两组“雇员”(1)和“自己的企业员工”(2)。

    另外,您也可以使用变量 _es ,这是再现的就业状况的ABS定义派生的变量。

    你是否为一组取决于你的研究问题相结合“自己的企业员工”和“雇主/个体户”。

    如果你想以符合ABS的定义,应该结合“员工”和“自己的企业员工”。

    In Mark Wooden’s own research of labour market behaviour, for example, he almost always discards the ABS definition 和 combines “employee of own business” with the “employer/self-employed” group.

  • 我应该使用哪个重?

    权重来从样本人口进行推断。

    您使用的重量取决于你所回答的问题。希尔达调查用户手册提供了一些指导其重量用在哪些情况。

  • 我应该重量不平衡面板?

    也许。当你构建的非平衡面板 响应者,你把所有从每一波的响应者和它们堆叠成具有每波人一个记录很长的文件。

    该 weight that could be used to weight this sample is the cross-sectional responding person weight from each wave. That is, in their Wave 1 observation, the person would be weighted by their Wave 1 cross-sectional responding person weight, their Wave 2 observation would be weighted by their Wave 2 cross-sectional responding person weight, 和 so on.

    同样,如果您正在构建的非平衡面板 列举人,那么你可以使用横截面列举人的重量。

    If you pool, say, five waves of data together, the sum of the weights will be around 100 million (that is, five times the average population size between 2001 和 2005). 该refore, you may wish to rescale the weights by dividing the total by the number of waves you have included in the unbalanced panel.

    加权样品中这样的决定取决于你正在开展不平衡面板上分析的类型。例如:

    • 如果你的分析是罕见的事件,你是有效采取合并样本,那么上述建议的加权策略应该罚款。
    • Alternatively, if your analysis requires at le如t two observations on the same individual, then you will be dropping those people who are only interviewed once. 该 cross-sectional weights, therefore, will not be appropriate (nor will the longitudinal weights).
  • 我应该使用什么样的重量,如果我池中的波的样本?

    When you are analysing an uncommon event (for example, divorce), you can pool the sample across waves. This sample, however, is subject to attrition that is not r和om, so it needs to be weighted.

    If you have pooled 响应者 across waves, you should use the cross-sectional responding person weight for the wave from which the case h如 been contributed.

  • 我如何匹配跨越波的人吗?

    使用交叉波标识符 xwaveid ,来匹配波人。

  • 我如何匹配户内的人吗?

    同一家庭内的人有一样的家庭标识符 _hhrhid*。

    标识符将从波改变家庭波。用一个人的跨波标识符 xwaveid 随着时间的推移它们匹配。

    *与用于波,其中“a”对应于波1相应的字母替换下划线,“B”对应于波2,依此类推。

  • 我怎么搭配情侣在一起?

    谁是已婚或事实上的关系,人们可以匹配通过他们的合作伙伴:

    • _hhpxid,伙伴的跨波标识符,或者
    • _hhprtrid中,合伙人的两位数人数,其可以被附加到家庭标识符的端 _hhrhid 创建合作伙伴的该波IDENTIFER。

    合作伙伴标识仅适用于生活在同一家庭的合作伙伴。同性夫妇有一个合作伙伴标识。

    注:替换与用于波,其中“a”对应于波1相应的字母下划线,“B”对应于波2,依此类推。

  • 我如何搭配孩子父母?

    一个孩子可以匹配通过他们的母亲或父亲:

    • _hhmxid_hhfxid 中,交叉波对母亲和父亲的标识符;要么
    • _hhmid_hhfid ,对于母亲和父亲的两位数人数,其可以被附加到家庭标识符的端 _hhrhid 创造了母亲和父亲的识别特定波。

    母亲和父亲的标识符仅适用于谁住在同一家庭的父母或父母的人。

    注:替换与用于波,其中“a”对应于波1相应的字母下划线,“B”对应于波2,依此类推。

  • 为什么有些受访者为零的权重?

    零个权重的受访者可能会出现有两个原因。

    之一

    该 HILDA sample in Wave 1 excluded people living in institutions (such as hospitals and other healthcare institutions, military and police installations, correctional and penal institutions, convents and monasteries) and other non-private dwellings (such 如 hotels 和 motels).

    其结果是,希尔达样品不能代表生活在非私人住宅的人。

    People that move into these dwellings after Wave 1 are given zero cross-sectional weights 和 zero longitudinal weights for the balanced panel starting from the wave in which they began living in a non-private dwelling.

    该 HILDA sample also excluded people living in remote 和 sparsely populated areas. Some of these are如 are excluded from the Australian Bureau of Statistics' population benchmarks, which are used in the weighting process.

    For Releases 1 to 4, the benchmarks only excluded remote and sparsely populated areas in the Northern Territory. Following Release 4, however, the ABS revised the are如 considered remote and sparsely populated to include very remote parts of New South Wales, Queensland, South Australia, Western Australia 和 the Northern Territory.

    这些区域由偏远区域分类确定,并具有一个值大于在10.53 澳大利亚的无障碍/偏僻指数。作为结果,从释放5,生活在这些地区样本成员少数给出零剖权重和零个纵向权重。

  • 我如何引用希尔达?

    下面的段落必须出现在任何研究使用希尔达调查数据显示:

    This paper uses unit record data from the Household, Income and Labour Dynamics in Australia (HILDA) Survey. The HILDA Project w如 initiated and is funded by the Australian Government Department of Social Services (DSS) and is managed by the Melbourne Institute of Applied Economic and Social 研究 (Melbourne 研究所). 该 findings and views reported in this paper, however, are those of the author 和 should not be attributed to either DSS or the 现金网app下载.

    包括上述声明许可或保密契约的契约,你签获得希尔达调查数据的要求。

    如果你想参考希尔达的设计下面还参考建议:

    沃森,正,和木,米。 (2012年),“希尔达调查:一个成功的家庭面板研究的设计和开发的案例研究”, 纵向和生命过程的研究卷。 3,没有。 3,第369-381。

  • 如何计算,如果有人在浪4退休了吗?

    Retirement status in Wave 4 is problematic. 该re w如 an oversight during preparation for Wave 4 that resulted in questions on retirement status contained in the Wave 2 Continuing Person Questionnaire not being reinstated.

    这些问题在波3被拆除,因为一个更全面的退休有关问题包括作为退休模块的一部分。

    Removal of this retirement module for Wave 4 should have been accompanied by the reinstatement of the original retirement questions, but this w如 overlooked 和 not rectified until Wave 5.

    You can define retirement status b如ed solely on age 和 labour force status, but to be consistent across waves you would need to apply the same criteria across all waves.

    另一种方法是排除波4完全。

  • 如何找到住户参考人的家庭吗?

    A household reference person is not provided in the HILDA datasets. 研究ers will have different definitions they may wish to apply to define a household reference person. It may depend on their particular research topic or on how they want this definition to apply over time 如 circumstances within the household change (e.g., if relative incomes levels differ over time, if relationships change over time, or if when someone moves out or in, etc.). Some variables that you might find useful in defining a household reference person is relationship in household (_hhrih), income (_tifefp and _tifefn), owner (_hsoid1 to _hsoid18, but these are only available in some years) 和 age (_hgage).

    Ple如e note that the person numbers (_hhpno) indicate which row on the Household Form that person is listed. 该 order in the first wave is simply the order the respondent mentions the people in the household to the interviewer. In later waves, joiners are added and leavers are removed 和 people are shuffled up for the next wave.

  • 我要如何连接的家庭随着时间的推移?

    We don’t provide a longitudinal household id 如 different users will have different definitions of what it means to be part of a longitudinal household. Does a birth or death change the household? What if someone moves in or out? Does it matter who they are or how they are related to the ‘core’ people in the household? If a couple divorces, who does the household belong to after the divorce? Or what happens if an adult son moves back into the family home – is it the same household or a different one? Does it matter if the adult son is 25 or 60? You would need to link households over time via the people that living within them.

    该 best file to use to do this is the master file 如 it contains summary information of all people who were ever part of an enumerated household. This includes the xwaveid and, for each wave, the household id 和 outcome status. You would need to make some decisions about what constitutes a continuing household or a new household for your purposes.

    You might also like to consider if you actually do need to think about your research question in terms of the longitudinal household concept. It may be possible to redefine it to what happens to people who live in certain types of households over time. Households are not a well-defined concept over time (as researchers would have different definitions depending on their particular research question) where如 individuals are.

  • 我应该使用在希尔达调查数据的估算值?

    Some variables in the HILDA Survey data are imputed when complete responses are not available from respondents. For example, many income variables contain imputed values. 该 HILDA Survey team provides users with information about which values are imputed. For example, for “household financial year disposable total income” (_hifditp/_hifditn) there is an imputation flag, _hifditf. Across the first 16 waves of the 希尔达调查, about 25% of the values for this variable at the household level are imputed. This variable is the sum of many income components 和 it only takes one missing value at the lower level for this overall total to be missing.

    A user might be tempted to throw these observations away 如 they do not contain actual responses from participants. Users might be worried that their analysis is affected by using these imputed values which are the product of some model that is being used by the 希尔达调查 team.

    First, users should know that most imputation is relatively innocuous. Often, respondents will have left one item blank in one year and it is pretty e如y to work out from other years of the same respondents 和 from other respondents a good guess for this item.

    Most importantly, however, is that throwing away imputed observations will create large amounts of bias in estimates relative to including the imputed values. While there may be some errors introduced by the imputation procedure, the errors introduced by excluding observations with imputed values will be much larger. This is due to “selection bias.” Observations for which values have been imputed are systematically different than those for which imputation has not been done. By excluding those observations, users risk introducing large amounts of selection bi如 into their estimates.

    该re is widespread agreement in the empirical social science literature and in the statistics literature that it is far superior to include the observations with imputed values. It is also recommended to include the imputation indicator (dummy) variable as an explanatory variable in your regression. For example, if you are estimating a model with “household financial year disposable total income”, you should include the _hifditf indicator 如 an additional explanatory variable. This will help to “soak up” any errors that may have been introduced by the imputation process. (See Frick and Grabka (2007), “Item non-response 和 Imputation of Annual Labor Income in Panel Surveys from a Cross-National Perspective”. DIW Discussion Paper 736.)

    这个答案是由教授罗伯特BREUNIG,澳大利亚国立大学提供。