home » assessment » deposition & cross-examination questions

## Deposition & Cross-Examination Questions on Tests & Psychometrics

NOTE: For those interested, here are some related resources on this website:- Update on Malingering Research
- Deposition and Cross-examination Questions on Psychological Tests & Psychometrics
- 8 Fallacies in Psychological Assessment
- Forensic Assessment Checklist
- Sample Agreement Between Expert Witness & Attorney
- Pearson Assessments HIPAA Regulations
- Harcourt Assessment's HIPAA Position Statement
- Multi-Health Systems' Test Disclosure Privacy Policy
- Responsibilities in Providing Psychological Test Feedback to Clients
- Practice Guidelines & Ethics Codes for Assessment, Forensics, Counseling, & Therapy
- The MMPI, MMPI-2, and MMPI-A In Court: A Practical Guide for Expert Witnesses and Attorneys (3rd Edition)

This chapter was written to provide guide for attorneys and to help expert witnesses prepare for depositions and cross-examination. Practicing answers to the types of questions in this chapter can enhance the expert's expertise by allowing him or her to identify and strengthen weaknesses. Practice can also enable experts to become more comfortable and articulate in responding to a carefully prepared, vigorous, informed cross-examination.

The chapter from which this section is excerpted organizes over 100 questions into 14 basic categories, moving from information about initial contacts, financial factors, and the expert's background to the details of the expert's professional opinions. The following section presents and discusses questions focusing on tests per se and psychometrics.

### What is a psychological test?

It is surprising that those who administer, score, and interpret standardized psychological tests may never have thought carefully about what a test is. Initial inquiry at this fundamental level may help an attorney to begin assessing the degree to which an individual has genuine expertise and understands the nature of testing as opposed to following a "cookbook'' method of test use or "improvising'' opinions. The individual's response may also help the attorney to assess the degree to which the individual can communicate clearly and concretely to a judge or jury. Some individuals may be quite knowledgeable in the area of psychometrics and inferences from test data but may be incapable of putting their knowledge into words that can be understood by those without special training (see chapter 4).

One possible answer to this initial question was proposed by Cronbach in the original edition of his classic text (1960; see also Cronbach, 1990): "A test is a systematic procedure for comparing the behavior of two or more persons'' (p. 21). Note that the "behavior'' may be oral (e.g., an individual telling what he or she sees when looking at a Rorschach card) or written (e.g., marking down "true'' or "false'' responses on the MMPI).

One of the most important aspects of the definition suggested by Cronbach
is that the procedure for comparing behavior is *systematic.* For
many tests, the system used to measure and compare behavior is *standardized.* The
MMPI, like the Rorschach, the WAIS-III, and the Halstead-Reitan Neuropsychological
Test Battery, is a standardized test. A standardized test presents a
standardized set of questions or other stimuli (such as inkblots) under
generally standardized conditions; responses from the individual are
collected in a standardized format and scored and interpreted according
to certain standardized norms or criteria. The basic idea is that all
individuals who take the test take the same test under the same conditions.
Obviously, not all aspects are exactly equivalent. One individual may
take the test during the day in a large room; another may take the test
at night in a small room. The assumption is, however, that in all essential
respects (i.e., those that might significantly affect test performance),
all individuals are taking the "same'' test.

Because characteristics of the individual taking the test or the testing circumstances may significantly influence test results and interpretations, experts must be aware of the research literature that addresses these factors. For some tests, it may tend to make a difference whether the examiner and the examinee are similar or different in terms of gender, race, or age. For most popular tests, systematic investigations have indicated which factors need to be taken into account in the scoring or interpretation of the tests so that extraneous or confounding factors do not distort the results.

Later sections of this chapter focus in more detail on such aspects as administration (e.g., whether the expert followed the standard procedures for administering the test or whether special individual characteristics or testing circumstances were adequately taken into account and discussed in the forensic report); the basic issue in this section is assessing the deponent's understanding and ability to communicate the fundamental nature of a standardized test.

### Are you able to distinguish retrospective accuracy from predictive accuracy?

**[Alternative or follow-up questions could involve distinguishing
sensitivity from specificity, or Type I error from Type II error,
as noted in the Glossary of this book.]**

This is a simple yes-or-no question. If the expert indicates understanding of these concepts, the attorney may want to ask a few follow-up questions to ensure that the answer is accurate.

If the expert replies "no," then the attorney may consider a subsequent question such as, "So would it be fair to say that you did not take these two concepts into account in your assessment?'' If the witness has indicated inability to distinguish between the two concepts, he or she is in a particularly poor position to assert subsequently that the concepts were taken into account in the assessment.

If the witness does indicate that although he or she is unable to distinguish the two forms of accuracy, he or she nevertheless took them into account in the assessment, the attorney may ask the witness to explain the meaning of these two seemingly contradictory statements and how the two forms of accuracy were taken into account in the assessment.

On the other hand, if the witness testifies that it would be a fair statement that retrospective and predictive accuracy were not taken into account in the assessment, then the attorney may ask additional questions to clarify that the witness has no information to provide regarding the two forms of accuracy, cannot discuss any of his or her professional opinions in terms of these forms of accuracy, did not weigh (when selecting the test or tests to be administered) the types of available tests or evaluate the test results in light of these forms of accuracy, and so on.

The two concepts are simple but are crucial to understanding testing
that is based on standardized instruments such as the MMPI-2 or MMPI-A.
Assume that a hypothetical industrial firm announces that they have developed
a way to use the MMPI-2 to identify employees who have shoplifted. According
to their claims (which one should greet with skepticism), the MMPI-2,
as they score and interpret it, is now a test of shoplifting. *Predictive
accuracy* begins with the test score. This hypothetical new MMPI-2
score (or profile) will be either positive (suggesting that the employee
who took the test is a shoplifter) or negative (suggesting that the individual
is not a nonshoplifter). The predictive accuracy of this new test is
the probability, given a positive score, that the employee actually is
a shoplifter, and the probability, if the employee has a negative score,
that the individual is *not* a shoplifter. Thus, the predictive
accuracy, as the name implies, refers to the degree (expressed as a probability)
that a test is accurate in classifying individuals or in predicting whether
or not they have a specific condition, characteristic, and so on.

* Retrospective accuracy,* on the other hand, begins not with the
test but with the specific condition or characteristic that the test
is purported to measure. In the example above, the retrospective accuracy
of this hypothetical MMPI-2 shoplifting test denotes the degree (expressed
as a probability) that an employee who is a shoplifter will be correctly
identified (i.e., caught) by the test.

Confusing the "directionality'' of the inference (e.g., the likelihood that those who score positive on a hypothetical predictor variable will fall into a specific group versus the likelihood that those in a specific group will score positive on the predictor variable) is, in a more general sense, a cause of numerous errors in assessment and in testimony on assessment, assessment instruments, and assessment techniques. Cross-examination must carefully explore the degree to which testimony may be based on such misunderstandings.

Psychologist Robyn Dawes (1988a) provided a vivid example. Assume that
the predictor is cigarette smoking (i.e., whether an individual smokes
cigarettes) and that what is predicted is the development of lung cancer.
Dawes observes that there is around a 99% chance (according to the actuarial
tables) that an individual who has lung cancer is a chronic smoker. This
impressive statistic *seems* to indicate or imply that whether
one is a chronic smoker might be an extremely effective predictor of
whether he or she will develop lung cancer. But the chances that a chronic
smoker will develop lung cancer are (again, according to the actuarial
tables) only around 10%.

Using these same statistics in another context, an expert witness might
indicate reasonable certainty that, on the basis of a defendant's showing
a particular MMPI profile, the defendant is a rapist. The witness's foundation
for such an assertion might be that a research study of 100 rapists indicated
that virtually all of them showed that particular MMPI profile (similar
to the statistics indicating that virtually all people with lung cancer
have been smokers). The problem is in trying to make the prediction in
the other direction: What percentage of all individuals (i.e., a comprehensive
and representative sample that includes a full spectrum of nonrapists
as well as rapists) showing that particular MMPI profile are *not* rapists?
Without this latter information (based on solid, independently conducted
research published in peer-reviewed scientific or professional journals),
there is no way to determine whether the particular MMPI profile is effective
in identifying rapists. To borrow again from the statistics on lung cancer,
it may indeed be true that a 99 or 100% of a sample of rapists showed
the particular profile, but it may also be true that only about 10% of
the individuals who show that profile are rapists. Thus, the evidence
that the witness is presenting would actually suggest that there is a
90% chance that the defendant was *not* a rapist.

The confusion of predictive and retrospective accuracy may be related
to the logical fallacy known as *affirming
the consequent.* In
this fallacy, the fact that x implies y is erroneously
used as a basis for inferring that y implies x. Logically,
the fact that all versions of the MMPI are standardized psychological
tests does *not* imply that all standardized psychological tests
are versions of the MMPI.

### When selecting a standardized psychological assessment instrument, what aspects of validity do you consider?

Expertise in MMPI-2 or MMPI-A administration, scoring, and interpretation
requires at least a basic knowledge of validity issues (see, e.g., *Standards
for Educational and Psychological Testing*). Although follow-up questions—keyed
to the content and detail of the initial response—are necessary, beginning
inquiry in the area of validity by asking an open-ended question during
the deposition can enable an attorney to obtain a general idea of how
knowledgeable the deponent is in this area.

The attorney can assess the degree to which the deponent's initial response addresses the various kinds of validity. Although there are a variety of ways in which validity can be viewed and assessed, Cronbach (1960) set forth four basic types.

* Predictive validity* indicates the degree to which test results
are accurate in forecasting some future outcome. For example, the MMPI-2
may be administered to all individuals who seek services from a community
mental health center. The results may be used to predict which patients
will be able to participate in and benefit from group therapy. Research
to validate the MMPI-2's predictive validity for this purpose would explore
possible systematic relationships between MMPI-2 profiles and patient
responses to group therapy. The responses to group therapy might be measured
in any number of ways, including the group therapist's evaluation of
the patient's participation and progress, the patient's own self-report
or self-assessment, and careful evaluation by independent clinicians.

* Concurrent validity* indicates the degree to which test results
provide a basis for accurately assessing some other current performance
or condition. For example, a clinician or researcher might develop the
hypothesis that certain MMPI-2 profiles are pathognomonic signs of certain
clinical diagnostic groups. (A pathognomonic sign is one whose presence
always and invariably indicates the presence of a clinical diagnosis.)
To validate (or invalidate) this hypothesis, MMPI-2 profiles might be
compared with the diagnoses as currently determined in a clinic by more
detailed, comprehensive, and time-consuming methods of assessment (e.g.,
extended clinical interviews conducted by independent clinicians in conjunction
with a history of the individuals and a comprehensive battery of other
psychological and neuropsychological tests). If the MMPI-2 demonstrates
adequate concurrent validity in terms of this hypothesis, the MMPI-2
could be substituted—at least in certain situations—for the more elaborate
and time-consuming methods of assessing diagnosis.

* Content validity* indicates the degree to which a test, as a
subset of a wider category of performance, adequately or accurately represents
the wider category of which it is a subset. For example, the bar examination
and the psychology licensing examination supposedly measure some of the
basic knowledge, skills, or abilities necessary to practice as an attorney
or a psychologist. The degree to which such examinations accurately reflect
or represent this larger domain is the content validity.

* Construct validity* indicates the degree to which a test accurately
indicates the presence of a presumed characteristic that is described
(or hypothesized) by some theoretical or conceptual framework. For example,
a researcher might develop a theory that there are four basic interpersonal
styles that attorneys use in developing rapport with juries. According
to this theory, each attorney uses the one basic style that is most consistent
with his or her core personality. The researcher then hypothesizes that
these styles can be identified according to an attorney's MMPI-2 profile
(i.e., the researcher theorizes that one set of MMPI-2 profiles indicates
a Type One core personality and a Type One interpersonal style for developing
rapport with a jury, another set of MMPI-2 profiles indicates a Type
Two core personality and a Type Two interpersonal style, etc.). Assessing
the validity of such possible indicants of a theoretical construct is
a complex task that involves attention to other external sources of information
thought to be relevant to the construct, intercorrelations of test items,
and examination of individual differences in responding to the test items
(see *Standards for Educational and Psychological Testing*).

Conceptualizations about test validity continue to emerge and constructs continue to evolve. Those interested in reviewing the evolving understanding of test validity are encouraged to read Geisinger's (1992) fascinating account.

### When selecting psychological assessment instruments, what aspects of reliability do you consider?

A basic knowledge of reliability issues is also—as with validity issues—fundamental
to expertise in MMPI-2 or MMPI-A administration, scoring, and interpretation
(see, e.g., *Standards for Educational and Psychological Testing*).
Again, an open-ended question may be the best approach to this area of
inquiry during the deposition.

Reliability refers to the degree to which a test produces results that are free of measuring errors. If there were no measuring errors at all, then it is reasonable to assume that test results would be consistent.

Reliability is another way of describing how consistent the results of a test are. Consider the following hypothetical situation. For the purposes of the example, assume that there are two completely identical people. If they are completely identical and if a test (such as the MMPI-2 or MMPI-A) were completely reliable (i.e., free from any measuring errors), then both people should produce the same responses to the test. However, now assume that one of these two identical people takes the test at nine a.m. when she is rested and alert. The other person takes the same test at two a.m. when she has just been awakened from a sound sleep and is tired and groggy. Differences in test results between these two otherwise identical people might be due purely to the times at which the test was administered. If the test were supposedly a measure of personality (such as the MMPI-2) and if the personalities of these hypothetically identical people were the same, then different test results do not actually represent a difference in personality but rather a difference or error in measurement (i.e., the time or conditions under which the test was administered).

Statistical techniques have been developed that indicate the degree to
which a test is reliable. Such statistical analyses are often reported
in the form of *reliability coefficients.* The coefficient will
be a number that falls in the range of zero (for no reliability) to one
(indicating perfect reliability).

The coefficient may indicate the reliability between subsequent administrations
of the same test (e.g., administering the MMPI-2 to a group of individuals
and then administering the same MMPI-2 to the same group 1 week or 1
month later). Reports of this type of reliability will often refer to
the *test-retest* reliability (or the coefficient of stability).
They may indicate, using a coefficient of equivalence, the reliability
between different forms of the same test. For example, a large group
of individuals might be randomly divided in half. One half would be given
the original MMPI, and the other half would be given the MMPI-2; 1 week
later, the half that took the original MMPI would take the MMPI-2 and
vice versa. Reliability between subsequent administrations (perhaps under
different conditions) of the same test is often termed *stability;* reliability
between different forms of the same test is often termed *equivalence* (Cronbach,
1960).

In some instances, test items will be divided independently into two
halves as a way to estimate the reliability of the test. This method
estimates the *split-half* reliability. The resulting coefficient,
often measured using a statistical method known as the Spearman-Brown
formula, is often termed the coefficient of *internal consistency.*

### What types of scales were involved in the various tests and methods of assessment that you considered in selecting the instruments and diagnostic frameworks that you used in the case at hand?

Different forms of measurement use different scales. The scales can refer to scores on a test or to the categories into which test responses fall. There are four basic types of scales.

The first type of scale is termed *nominal.* As the Latin root
(nomen, meaning "name'') from which we derive a number of similar
English words (e.g., nominate, denomination, and nomenclature) implies,
nominal scales simply provide names to two or more different categories,
types, kinds, or classes. A two-category nominal scale might be invented
to describe the various individuals in a courtroom: participants and
observers. The same population might be described using a more detailed
nominal scale with categories such as jurors, prosecution team, defense
team, and so on. Note that the categories are listed in no particular
order. Assigning an individual to a particular group on a nominal scale
indicates only that the individual is in a group that is different from
the others.

If placement of an individual (or object, verbal response, etc.) into
a particular group indicates that an individual (or object, etc.) is
in a different group from all others, then there can be no overlap among
groups. That is to say, the groups must be *mutually exclusive:* Placement
in one group indicates that the person, object, response, and so on,
does not belong in any of the other groups. Thus, the four categories
of mammals, living things, humans, and whales do *not* constitute
a nominal scale in the sense used here because the categories are not
mutually exclusive (i.e., a particular individual may be placed in more
than one of the categories). Individuals who take the MMPI are asked
to use a nominal scale in responding to each of the items; the scale
has two values: "true'' and "false.''

The second type of scale does place its categories in a particular order
and is termed an *ordinal* scale. For example, an attorney might
evaluate all the cases he or she has ever tried and sort them into three
categories: "easy,'' "moderate,'' and "difficult.'' The
scale indicates that cases in the middle group were harder for the attorney
to try than cases in the easy group, but there is no information about
how much harder. The scale only places the items (in this instance, legal
cases) in three ordered categories, each category having more (or less)
of a particular attribute (such as difficulty) than the others.

The third type of scale is a particular kind of ordinal scale in which
the interval between each group is the same. An example of an *interval* scale
is any listing of the days of the week: Wednesday, Thursday, Friday,
Saturday, Sunday, and so on. When events are classified according to
this interval scale, it is clear that the temporal distance between Wednesday
and Thursday is the same as that between Saturday and Sunday or any other
two consecutive days. An important characteristic of an interval scale
(that sets it apart from the fourth type of scale described below) is
that there is no absolute or meaningful zero point. Some people may begin
their week on Mondays, others on Saturdays, still others on Sundays;
from time to time a "3-day weekend'' leads into a week that "begins''
on Tuesday. The Fahrenheit scale for measuring temperature is an example
of an interval scale: the zero on the scale is arbitrary.

The fourth type of scale is a scale of equal intervals in which the zero
point is absolute, and it is termed a *ratio* scale. An example
of a ratio scale is one's weight. The zero point is not arbitrary. As
the name of the scale implies, the ratios may be meaningfully compared.
For example, a person who weighs 100 pounds is twice as heavy as a person
who weighs 50 pounds, and a person who weighs 200 pounds is twice as
heavy as a person who weighs 100 pounds. Such ratios do *not* hold
for the other three types of scales. For example, because the zero point
on the Fahrenheit scale is arbitrary, one cannot accurately state that
40 degrees is twice as hot as 20 degrees or that 200 degrees is twice
as hot as 100 degrees.

The deponent's explanation of these different types of scales and their meaning for different assessment instruments that were considered (e.g., the MMPI-2 or MMPI-A, the WAIS-III, the Rorschach, a sentence-completion test) will indicate the degree to which he or she understands this aspect of psychological assessment and can communicate it effectively to a judge or jury.

### What is an arithmetic mean? What is a median? What is a mode?

These concepts are central to understanding psychological assessment in general and the MMPI-2 or MMPI-A in particular. Without an understanding of these concepts, there can be no understanding of the T-scores on which the MMPI-2 or MMPI-A is based.

The mean is one of three major ways of describing the *central tendency* of
a distribution of scores (i.e., the "center'' around which all the
other scores seem to cluster). The arithmetic *mean* can be defined
statistically as the sum of all scores divided by the number of scores.
In other words, the mean is the arithmetic average of the scores. The *median,* which
is the second measure of central tendency, is that number that is in
the "middle'' of the distribution: half of the scores fall below
the median, and the other half of the scores fall above the median. The
third measure of central tendency is the *mode,* which indicates
the score that appears most often. If there were seven IQ scores—98,
100, 102, 102, 103, 103, and 103—then the mode would be 103 because it
appears most often (i.e., three times out of seven).

These concepts are easily misunderstood. For example, an otherwise knowledgeable psychiatrist, Karl Menninger (1945), took other people to task for their statistical ignorance when he wrote:

Fortunately for the world, and thanks to the statisticians (for this, of course, is a mathematically inevitable conclusion), there are as many people whose intelligence is above the average as there are persons whose intelligence is below the average. Calamity howlers, ignorant of arithmetic, are heard from time to time proclaiming that two-thirds of the people have less than average intelligence, failing to recognize the self-contradiction of the statement. (p. 199)

While it is *possible* that the number of people whose intelligence
is above average is exactly the same as the number of people whose intelligence
is below average, there is no necessary self-contradiction in the statement
that he criticizes. In the common I.Q. tests, the average I.Q.
is 100. But this number, which is a mean, does *not* necessarily
represent the median. Consider a population of 3 people: the first
has an IQ of 90, the second has an IQ of 90, and the third also has an
IQ of 120. The average IQ for this population is 100 (i.e., 90
+ 90 + 120 = 300, and 300 divided by 3 = 100), but two-thirds of the
population have less than average intelligence.

### What is a standard deviation? What is variance?

The *standard deviation* is one way of describing the "scatter''
of a distribution of scores—the degree to which the scores vary from
the central tendency. These measures of scatter or dispersion are,
like the concepts of central tendency described in the previous section,
essential to understanding the T scales on which the MMPI instruments
are based.

The statistical formula for the standard deviation is somewhat complicated.
Each score is subtracted from the mean to produce a deviation from the
mean. Each of these deviations is squared. (A number is squared when
the number is multiplied by itself. The square of 2—that is to say, 2
times 2—is 4; the square of 3—that is, 3 times 3—is 9; the square of
4 is 16.) These squared deviations are then added together into a total
sum of squares. The total sum of squares is then divided by the number
of scores. [**Footnote**: This is the formula for
determining the variance or standard deviation in *descriptive* statistics.
In *inferential* statistics, the sum of squares is divided not
by the number of scores but rather by the number of scores minus one.
In descriptive statistics, one is simply trying to describe the scores
or numbers that are available (e.g., the IQ scores of the children in
one sixth-grade classroom). In inferential statistics, one is trying
to use the scores or numbers that are available—called the *sample*—as
a basis for drawing inferences about a wider group of scores or numbers—called
the *population* (e.g., attempting to use the IQ scores of the
sample of children in one sixth-grade classroom to infer or estimate
the IQ scores for the population of all sixth-grade students in the school
system).]

This total sum of squares divided by the number of scores (or the number
of scores minus one) is the *variance* (i.e., the degree to which
the scores vary from or vary around the mean). The *standard deviation* is
the square root of the variance. The larger the standard deviation, the
farther the scores tend to fall from the mean.

### What is a T score, and what are its psychometric properties?

Understanding the nature of the *T* score is essential to understanding
the MMPI instruments (see chapter 2). Both the original MMPI and the
revised versions (MMPI-2 and MMPI-A) are based on *T* scales,
although there are significant differences between the original and later
versions that will serve as the focus of subsequent questions.

The raw scores (e.g. of the content scales) of the MMPI-based measures
are translated—through statistical methods—into a *T*-score distribution.
A *T scale* is a distribution of scores in which the mean, as
previously described, is 50 and the standard deviation, described in
the previous section, is 10.

If the *T* scale describes a normal distribution, the distribution
is said to fall into a bell-shaped curve. In the normal distribution,
68% of the scores fall within one standard deviation of the mean; 95%
fall within two standard deviations of the mean; and 99% fall within
three standard deviations of the mean. These percentages apply only to
a normal or normalized *T* scale and not necessarily to a linear *T* scale
or a uniform *T* scale (for information about the *T* scale
and its various forms, see chapter 2 and the Glossary).

Most of the original MMPI validity and clinical scales were derived according
to the formula for linear *T* scores (Dahlstrom et al., 1972)
except *L* and *F,* for which the mean values were arbitrarily
set. Each of these scales was separately derived, and each has a slightly
different skew. Thus, the distributions are not *uniform* nor
are they normal. This is to say, a particular *T* score does not
fall at the same percentile rank across all scales (Colligan et al.,
1983).

The original MMPI's lack of uniformity among the clinical scales has
been somewhat problematic (e.g., when comparing scores on different scales).
In MMPI-2 and MMPI-A, however, this lack of uniformity was resolved by
developing linear *T* scores that did possess uniformity across
given percentile values. This scale norming, referred to as uniform *T* scores,
is described in the MMPI-2 manual (Butcher et al., 1989) and is discussed
extensively by psychologists Auke Tellegen and Yossef Ben-Porath (1992b).