The vast majority of personality tests use Likert scales. You read a statement — "I enjoy being the centre of attention" — and rate your agreement from 1 (strongly disagree) to 5 (strongly agree). This format is intuitive, quick, and has accumulated decades of psychometric research behind it.
It also has a well-known problem: it is easy to game. More than that, even people who are genuinely trying to be honest tend to respond in systematically biased ways. Forced-choice personality assessment is an alternative format designed to address these problems. Understanding what it is, how it works, and why it produces more accurate data is essential for evaluating any personality instrument — including Cèrcol's Witness tool.
Why Traditional Likert-Scale Personality Tests Have a Bias Problem
Likert-scale personality tests dominate both research and applied assessment. The Big Five Inventory, the NEO-PI-R, the IPIP scales, and hundreds of proprietary instruments all use variations of the same format: rate yourself on a set of statements from "disagree" to "agree."
The strengths of this format are real. It is intuitive for respondents, easy to score, and produces normative data — meaning scores can be compared across individuals on an absolute scale. A score of 4.2 on Conscientiousness is directly comparable across different people who took the same test.
But Likert scales have two structural weaknesses that cannot be fully addressed through careful item writing or instructions to "answer honestly."
The first is acquiescence bias: the tendency to agree rather than disagree, regardless of content. Across cultures and populations, people tend to endorse statements at rates higher than chance — saying "agree" is the path of least resistance. This inflates all trait scores uniformly.
The second is social desirability bias: the tendency to endorse statements that present a favourable self-image. When the socially valued answer is obvious (and on most personality items it is), motivated self-presenters can maximise their scores on valued dimensions without any constraint. For a full explanation of how much this distorts Big Five profiles, see social desirability bias in personality tests.
These two biases combine to produce scores that are a mixture of genuine trait levels and response style — and that mixture is difficult to disentangle after the fact. For teams wondering whether their DISC or 16Personalities scores are inflated by this effect, the answer is almost certainly yes. See DISC vs Big Five: why four styles aren't enough for a broader discussion of what gets lost when measurement design does not address bias.
What Forced-Choice Personality Assessment Actually Is
Forced-choice personality assessment — see Forced choice — presents items differently. Instead of rating each statement independently, respondents are presented with pairs (or triplets) of statements or adjectives and asked to choose which one better describes them.
For example, instead of separately rating "I am talkative" and "I am thorough," a forced-choice item might present both together and ask: "Which of these words better describes you?" The respondent must choose one. They cannot simultaneously endorse both at a high level.
This simple structural change has important consequences:
- Acquiescence becomes impossible: you cannot agree with both options. Every choice reveals a preference between two traits.
- Social desirability is reduced: when both options are positively valenced (as in well-designed forced-choice instruments), there is no obviously "good" answer. Choosing "warm" over "precise" does not make you look better or worse — it just reveals relative priorities.
"Forced-choice formats eliminate acquiescence responding and substantially reduce social desirability inflation by requiring respondents to allocate fixed amounts of endorsement across competing trait descriptions."
— Adapted from Stark, S., Chernyshenko, O. S., & Drasgow, F. (2005). An IRT approach to constructing and scoring pairwise preference items. Applied Psychological Measurement, 29(3), 184–201. See also doi:10.1037/0022-3514.63.1.146
Why forced-choice works: Standard Likert scales allow respondents to rate themselves 'highly' on every trait. Forced-choice formats require trade-offs between equally desirable options — forcing respondents to reveal relative priorities rather than absolute ideals. Research shows forced-choice reduces social desirability bias by 40–60% compared to Likert scales.
Ipsative Scoring: What It Means for Big Five Score Interpretation
Forced-choice instruments produce what is called ipsative data. An ipsative score represents a person's standing on a trait relative to their own other traits — not relative to a population norm. If your profile shows high Presence and low Depth, this means you are more extraverted than you are neurotic in relative terms. It does not necessarily tell you whether you are more extraverted than the average person.
This is a genuine limitation. Ipsative data cannot be used for all the same purposes as normative data. In particular, comparing two people's profiles directly (person A's Presence score vs person B's) is methodologically complicated with ipsative data, because both profiles are internally referenced. For a full treatment of normative vs ipsative scoring, see how personality test scores are calculated. Research on how to handle ipsative data appropriately, and on approaches that produce more normative estimates from forced-choice designs (such as IRT-based scoring), is ongoing.
Cèrcol's approach acknowledges this limitation. The Witness instrument is designed primarily to reveal relative priorities and blind spots — where is this person perceived as stronger or weaker relative to their own overall profile, and how does that compare to their self-perception? This is a valid and valuable use of ipsative data, even if absolute cross-person comparisons require additional methodological care.
The AB5C Circumplex: How Adjectives Map onto Big Five Intersections
Cèrcol's Witness instrument is grounded in the Abridged Big Five Circumplex (AB5C), developed by Hofstee, de Raad, and Goldberg (1992). The AB5C is a systematic framework for mapping personality adjectives onto the Big Five dimensions — not as pure, single-factor indicators, but as weighted combinations of two factors.
In the AB5C framework, a word like "assertive" is not simply an Extraversion adjective — it loads on both Extraversion and (low) Agreeableness. A word like "creative" loads on both Openness and (low) Conscientiousness. By mapping adjectives onto these intersections, the AB5C captures the rich, overlapping structure of personality language more accurately than a simple factor-by-factor approach. For the broader context of how personality language was systematically analysed to produce the Big Five, see history of the Big Five from Allport to Goldberg.
This matters for forced-choice design because it allows pairs to be constructed that are genuinely psychometrically informative — each choice between a pair of adjectives provides information about a respondent's position on the relevant Big Five dimensions. The pairs are not arbitrary; they are principled.
In Cèrcol's Witness instrument, adjective pairs are selected to maximise discriminative information across the five dimensions (Presence, Bond, Vision, Discipline, Depth) while keeping social desirability values as equal as possible within each pair. This ensures that choices reveal genuine personality differentiation rather than differential social desirability.
How Cèrcol's Witness Applies Forced-Choice to Peer Assessment
The Witness instrument is the peer-assessment component of Cèrcol. Rather than asking Witnesses (peer assessors) to rate the target person on behavioural statements, Witnesses are presented with adjective pairs and asked to choose which word better describes the person they know.
The instrument is built on the IPIP tradition — the open-science alternative to commercially controlled personality instruments. All item development is transparent and documented. The source code, scoring algorithm, and psychometric documentation are available under an open-source licence at cercol.team/science.
A typical Witness session takes 8–12 minutes. The resulting profile shows the target person's scores on each of the five Cèrcol dimensions as perceived by that Witness, and the aggregate across all Witnesses provides the peer composite. This composite is then compared to the target's self-report to identify alignment and gaps. For the full rationale for why this peer layer matters, see why self-assessment alone isn't enough: peer personality feedback. The question of anonymity in peer ratings is addressed in anonymity in personality assessment: why it matters.
Likert Scale vs Forced-Choice: Full Methodology Comparison
| Dimension | Likert Scale | Forced-Choice (Cèrcol Witness) |
|---|---|---|
| Acquiescence bias | High structural risk | Eliminated by design |
| Social desirability | High, especially for valued traits | Substantially reduced (matched valence pairs) |
| Score interpretation | Normative (absolute level) | Ipsative (relative priorities) |
| Cross-person comparison | Straightforward | Requires methodological care |
| Faking resistance | Low for transparent items | Higher — both options typically positive |
| Theoretical grounding | Factor-by-factor | AB5C circumplex |
| Cognitive demand | Low | Moderate — genuine deliberation required |
Honest Limitations of Forced-Choice Personality Assessment
Forced-choice is not a panacea. Several limitations deserve honest acknowledgement.
First, the ipsative scoring problem described above. While IRT-based approaches (Thurstonian IRT, as developed by Brown and Maydeu-Olivares) can recover more normative-like estimates from forced-choice data, these methods are computationally demanding and require substantial sample sizes to calibrate accurately. Simpler forced-choice scoring remains somewhat ipsative.
Second, forced-choice assessments are cognitively more demanding. Respondents must genuinely compare two options and decide which fits better, rather than simply rating each item independently. This can slow completion times and may be frustrating for respondents who feel that "both apply equally." The inability to say "both" is by design, but it can feel unnatural.
Third, forced-choice does not eliminate all strategic responding. A determined self-enhancer who knows which adjectives map onto which valued dimensions can still systematically choose the "right" adjectives. For the full literature on what motivated faking actually does to personality test scores, see can you fake a personality test?. Forced-choice raises the cognitive cost of strategic responding, but it does not make it impossible — particularly for respondents with prior exposure to personality theory.
Despite these limitations, the weight of psychometric evidence is clear: forced-choice instruments produce less biased, more differentiating data than Likert scales in high-stakes assessment contexts. For the Witness use case — peer assessment where social incentives to rate favourably are real — forced-choice design is the methodologically superior choice. And if you are evaluating this against the broader landscape of what tools are available, see the best free personality tests for teams in 2026.
Forced-Choice vs Likert: Which Produces More Honest Big Five Data?
Forced-choice personality assessment eliminates acquiescence bias by construction and substantially reduces social desirability bias through matched-valence item pairs. The AB5C circumplex provides the theoretical grounding for psychometrically principled adjective pair selection. Cèrcol's Witness instrument applies these principles in an open-source, IPIP-grounded peer assessment tool designed to produce the most honest, most differentiating peer personality data available. The limitation — ipsative scoring — is real and acknowledged, and interpretation is designed accordingly.
Try a forced-choice Big Five assessment: Cèrcol's Witness instrument
Most personality assessments — DISC, 16Personalities, even Likert-scale Big Five tools — are vulnerable to the same structural problem: when the socially desirable answer is visible, motivated respondents (and even honest ones trying to be accurate) will skew their responses toward it. Forced-choice design is the most evidence-backed solution available.
Cèrcol's Witness peer assessment is a forced-choice instrument built on the AB5C circumplex framework and grounded in the public-domain IPIP item tradition. Witnesses choose between adjective pairs matched for social desirability — making it structurally hard to be uniformly positive about the person they are rating. The result is peer personality data that reflects how the person is genuinely experienced, not just how much the Witness likes them.
The self-assessment at cercol.team is free. Adding Witness assessments takes each peer 8–12 minutes. Read the full scientific rationale to understand how the forced-choice design and AB5C grounding work together to deliver more honest Big Five data.
References
Hofstee, W. K. B., de Raad, B., & Goldberg, L. R. (1992). Integration of the Big Five and circumplex approaches to trait structure. Journal of Personality and Social Psychology, 63(1), 146–163. doi:10.1037/0022-3514.63.1.146
Brown, A., & Maydeu-Olivares, A. (2011). Item response modeling of forced-choice questionnaires. Educational and Psychological Measurement, 71(3), 460–502.
Further reading
- Social desirability bias in personality tests: how big is the problem?
- Can you fake a personality test? What the research actually shows
- Why self-assessment alone isn't enough: the case for peer personality feedback
- What is reliability and validity in personality testing?
- Why 120 items is better than 10: the trade-off in personality test length
- Anonymity in personality assessment: why it matters for honest data