What is Reliability?

Split-half method

This is known as researcher or observer error. Since the judgement of researchers is not perfect, we cannot assume that different researchers will record a measurement of something in the same way.

Assessing Reliability
Reliability in research

Some variables are more stable constant than others; that is, some change significantly, whilst others are reasonably constant.

Therefore, the score measured e. The true score is the actual score that would reliably reflect the measurement e. The error reflects conditions that result in the score that we are measuring not reflecting the true score , but a variation on the actual score e. This error component within a measurement procedure will vary from one measurement to the next, increasing and decreasing the score for the variable.

It is assumed that this happens randomly, with the error averaging zero over time; that is, the increases or decreases in error over a number of measurements even themselves out so that we end up with the true score e. Provided that the error component within a measurement procedure is relatively small , the scores that are attained over a number of measurements will be relatively consistent ; that is, there will be small differences in the scores between measurements.

As such, we can say that the measurement procedure is reliable. Take the following example:. Intelligence using IQ True score: Actual level of intelligence Error: Caused by factors including current mood, level of fatigue, general health, luck in guessing answers to questions you don't know Impact of error on scores: Would expect measurements of IQ to be a few points up and down of your actual IQ, not to points, for example i.

By comparison, where the error component within a measurement procedure is relatively large , the scores that are obtained over a number of measurements will be relatively inconsistent ; that is, there will be large differences in the scores between measurements. As such, we can say that the measurement procedure is not reliable. Reaction time by measuring the speed of pressing a button when a light bulb goes on i.

Actual reaction speed of person Error: Potential for time to be significantly different from one measurement to the next e. Take multiple measurements rather than a single measurement, and then average the scores.

You can learn more about reliability , error and reaction times by reading Yellott , Ratcliff , and Salthouse and Hedden All measurement procedures involve error. The reliability of a test could be improved through using this method.

For example any items on separate halves of a test which have a low correlation e. The split-half method is a quick and easy way to establish reliability. However it can only be effective with large questionnaires in which all questions measure the same construct.

This means it would not be appropriate for tests which measure different constructs. For example, the Minnesota Multiphasic Personality Inventory has sub scales measuring differently behaviors such depression, schizophrenia, social introversion. Therefore the split-half method was not be an appropriate method to assess reliability for this personality test.

The test-retest method assesses the external consistency of a test. Examples of appropriate tests include questionnaires and psychometric tests. It measures the stability of a test over time. A typical assessment would involve giving participants the same test on two separate occasions.

If the same or similar results are obtained then external reliability is established. The disadvantages of the test-retest method are that it takes a long time for results to be obtained. The timing of the test is important; if the duration is to brief then participants may recall information from the first test which could bias the results. Alternatively, if the duration is too long it is feasible that the participants could have changed in some important way which could also bias the results.

This refers to the degree to which different raters give consistent estimates of the same behavior. Inter-rater reliability can be used for interviews.

Note, it can also be called inter-observer reliability when referring to observational research. Here researcher when observe the same behavior independently to avoided bias and compare their data. If the data is similar then it is reliable.

In this scenario it would be unlikely they would record aggressive behavior the same and the data would be unreliable. However, if they were to operationalize the behavior category of aggression this would be more objective and make it easier to identify when a specific behavior occurs.

Thus researchers could simply count how many times children push each other over a certain duration of time. Manual for the beck depression inventory The Psychological Corporation. San Antonio , TX. Manual for the Minnesota Multiphasic Personality Inventory.

Reliability is a necessary ingredient for determining the overall validity of a scientific experiment and enhancing the strength of the results. Debate between social and pure scientists, concerning reliability, is robust and ongoing.

The term reliability in psychological research refers to the consistency of a research study or measuring test. For example, if a person weighs themselves during the course of a day they would expect to see a similar reading.

Internal consistency reliability is a measure of reliability used to evaluate the degree to which different test items that probe the same construct produce similar results. Average inter-item correlation is a subtype of internal consistency reliability.

Reliability in research. Reliability, like validity, is a way of assessing the quality of the measurement procedure used to collect data in a dissertation. In order for the results from a study to be considered valid, the measurement procedure must first be reliable. Research reliability is the degree to which research method produces stable and consistent results. A specific measure is considered to be reliable if.