Is Cèrcol based on the Big Five?

Yes. Cèrcol measures personality using the OCEAN model (Big Five) via the IPIP public-domain item pool (Goldberg et al. 2006). The 12 team roles are derived from the AB5C circumplex (Hofstee et al. 1992) and team composition research (Bell 2007; Neuman & Wright 1999).

What makes Cèrcol different from Belbin or DISC?

Cèrcol's roles are grounded in the Big Five (OCEAN) personality model using the IPIP public-domain item pool. The scoring pipeline is fully open source and auditable. Witness Cèrcol uses forced-choice adjective selection — not Likert scales — to eliminate social desirability bias in peer assessment. Unlike Belbin or DISC, all items are public domain and the entire methodology is published and citable.

Is the personality assessment free?

The New Moon Cèrcol (10 items, Big Five snapshot) and First Quarter Cèrcol (60 items, IPIP-NEO-60, 30 facets) are always free — no account required. The Full Moon Cèrcol (120 items, IPIP-NEO-120, Witness peer assessment, cognitive ability measure) requires a one-time payment.

What is Witness Cèrcol?

Witness Cèrcol is a peer personality assessment where someone who knows you well rates you using a forced-choice adjective selection method — picking the best-fit and worst-fit adjective per round from a set covering all five OCEAN dimensions. Forced choice eliminates the social desirability bias that affects standard Likert-scale peer ratings. Dimensions where your self-rating and peer ratings diverge by more than 0.8 standard deviations are flagged as potential blind spots.

How are the 12 team roles derived?

The 12 roles are derived from the AB5C circumplex (Hofstee, De Raad & Goldberg 1992), covering all six intersections of the three team balance dimensions (Presence/Extraversion × Bond/Agreeableness × Vision/Openness) at both poles. The selection of these three dimensions as requiring team-level balance is grounded in Bell (2007) and Neuman & Wright (1999). Discipline (Conscientiousness) and Depth (Neuroticism) modulate role expression but do not define team balance.

No account is required for any instrument. During assessment, no personal data is collected — only anonymous scores are logged. Data is stored on our own servers (Hetzner Online GmbH). No third-party analytics. No data is shared with or sold to third parties.

Is Cèrcol based on the Big Five (OCEAN)?

Yes. Cèrcol measures personality using the OCEAN model (Big Five) via the IPIP — the International Personality Item Pool, a public-domain collection validated in thousands of published studies. The five dimensions are Presence (Extraversion), Bond (Agreeableness), Vision (Openness), Discipline (Conscientiousness), and Depth (Neuroticism). Because the IPIP is public domain there are no licence restrictions: the full item pool and scoring logic are open and citable.

How is Cèrcol different from Belbin, DISC, or StrengthsFinder?

Three things set Cèrcol apart. First, the items come from the Big Five (OCEAN), the most replicated personality model in academic research — not a proprietary framework. Second, the full item pool (IPIP) and scoring pipeline are public domain and auditable; there is no black box. Third, the Witness peer assessment uses forced-choice adjective selection instead of Likert scales, which eliminates the social desirability bias that affects most 360-feedback tools. Belbin and DISC use closed, proprietary methodologies.

What are blind spots in team personality assessment?

A blind spot is a personality dimension where how you see yourself and how others see you diverge significantly — more than 0.8 standard deviations apart. Cèrcol's Witness peer assessment detects blind spots by comparing your self-report with forced-choice adjective ratings from people who know you. Blind spots are neither good nor bad: they show where your self-perception and others' experience of you don't match, which is often more actionable than the score itself.

Why 120 items is better than 10: the trade-off in personality test length

The Spearman-Brown Formula: Why Test Length Predicts Big Five Reliability

The mathematical relationship between test length and reliability was formalised over a century ago by Charles Spearman and William Brown working independently. The Spearman-Brown prophecy formula predicts how reliability changes when you change the number of items in a test, assuming the new items are of similar quality to the original ones.

The formula has a specific implication: reliability gains from adding items follow a curve of diminishing returns. Going from 2 items to 10 items produces a large reliability gain. Going from 80 items to 120 items produces a much smaller one. The first few items do the most work; each additional item adds less than the one before.

This is why the choice of test length is a genuine engineering decision rather than a simple "more is always better" conclusion. At some point, the burden on respondents exceeds the reliability gain. The practical question is where that point lies for the use case in question. For a complete treatment of how reliability is defined and measured, see what is reliability and validity in personality testing.

"The Spearman-Brown formula makes the reliability-length relationship precise: to double the reliability of a test, you need to roughly quadruple its length."

What 10-Item Big Five Tests Miss That Longer Instruments Capture

The TIPI's two items per dimension cannot, by construction, capture facet-level variation within each Big Five dimension. As described in what is a facet in personality psychology, each Big Five dimension contains six facets — narrow sub-traits that can point in different directions for people with the same overall dimension score.

A two-item Conscientiousness scale might successfully classify whether a person is broadly high or low on the dimension. It cannot distinguish between someone whose Conscientiousness is driven by Order and Dutifulness versus someone whose profile is dominated by Achievement Striving and Self-Discipline — which is precisely the distinction most relevant for role fit and development.

The same limitation applies to all dimensions. A two-item Openness scale cannot separate intellectual curiosity from aesthetic sensitivity. A two-item Neuroticism scale cannot distinguish anxiety-driven reactivity from anger-driven reactivity.

Short tests also show reduced reliability for individuals near the middle of the distribution — the range where most people score on most dimensions. For clearly extreme scorers (very high or very low), two items may be sufficient to classify them reasonably. For the majority who score in the moderate range, the measurement error from a two-item scale is large enough to produce different classifications on retest. For the statistical explanation of why this matters, see how personality test scores are calculated.

TIPI vs IPIP-NEO-120: Reliability Trade-Offs Side by Side

The IPIP-NEO-120 is a 120-item, freely available instrument that measures all five Big Five dimensions and all thirty facets. It was developed specifically as an open-access alternative to the proprietary NEO PI-R, and its validity properties have been documented in peer-reviewed research.

The comparison with the TIPI illustrates the reliability-length trade-off directly:

Test length	Example instrument	Items per dimension	Facet measurement	Reliability estimate (α)	Appropriate use case
10 items	TIPI	2	None	~0.45–0.65 per dimension	Large-scale population research; screening when brevity is essential; low-stakes self-exploration
44 items	BFI (Big Five Inventory)	~8–9	None	~0.75–0.85 per dimension	Academic research requiring balance of brevity and reliability; group-level studies
60 items	IPIP-NEO-60	12	Partial	~0.80–0.87 per dimension	Applied research; moderate-stakes development contexts
100–120 items	Cèrcol / IPIP-NEO-120	20–24	Full (30 facets)	~0.87–0.93 per dimension	Individual development; team profiling; coaching; high-stakes assessment
240 items	NEO PI-R (full)	48	Full (30 facets)	~0.90–0.95 per dimension	Clinical assessment; research requiring maximum precision; high-stakes selection

When a Short Personality Test Is Actually Appropriate

The case for short personality tests is real and should not be dismissed. In certain contexts, a 10-item instrument is the right choice.

Large-scale population research requires completion from thousands of respondents. A 10-minute completion time creates significantly higher dropout than a 2-minute one, which produces biased samples. When the research question concerns population-level trends rather than individual profiles, the TIPI's weaker reliability is acceptable because it is averaged across large samples.

Screening contexts — where the goal is to identify who might benefit from a more thorough assessment — can appropriately use short instruments. If a 10-item screen identifies candidates in the upper or lower quartile of a dimension for further assessment, the brevity is a reasonable trade-off.

Repeated measurement presents a different problem. If you want to track personality change over time — or across multiple development interventions — administering a 120-item instrument every quarter is burdensome. A validated short form used consistently over time can produce more actionable longitudinal data than an infrequent full-length administration.

Low-stakes self-exploration — where the user is simply curious about their personality rather than using the data for a consequential decision — can appropriately use shorter instruments. The cost of measurement error is lower when the stakes are lower. For a comparison of which free assessments are appropriate for which stakes, see the best free personality tests for teams in 2026.

When Test Length Matters: Individual Development and Team Profiling

The case for longer instruments becomes stronger as the stakes and the specificity requirements of the use case increase.

Individual development requires facet-level data. A 10-item instrument cannot tell a coach or manager why someone's Conscientiousness score is what it is — which facets are driving it, and which development interventions are most likely to be effective. A 120-item instrument with facet-level scoring provides the specificity that development conversations require.

Team profiling requires reliable individual scores as inputs to team-level analysis. If individual scores have high measurement error, the team profile inherits that error. A team map built on TIPI scores will show greater random variation between profiles than one built on longer instruments — which reduces the map's usefulness for deliberate team design. See Cèrcol's 12 team roles for how facet-level profiles translate into team role insights.

Peer assessment compounds the argument. Cèrcol's Witness model asks observers to assess someone else's personality across multiple dimensions and facets. A short instrument would collapse the signal from Witness assessments to the point where observer-vs-self discrepancies — the most informative data in the report — become unreliable. The Witness methodology is explained in detail at what the Cèrcol Witness instrument measures.

High-stakes decisions — performance assessment, role redesign, selection for leadership programmes — require that the data be reliable enough to act on. A measurement with α = 0.55 (typical TIPI) means that 45% of score variance is random noise. A measurement with α = 0.90 means only 10% is noise. The difference between acting on 55% signal vs 90% signal is the difference between useful data and randomised decisions.

Why Cèrcol Uses 120 Items to Balance Reliability and Completion Time

Cèrcol's instrument uses 120 items — 24 per Big Five dimension — providing facet-level measurement while staying substantially shorter than the full 240-item NEO PI-R. The design reflects a deliberate trade-off: retain facet resolution and reliability above 0.87 per dimension while keeping completion time to approximately 15 minutes.

This length is supported by the reliability and validity evidence for IPIP-based instruments at this item count, and by the practical reality that team profiling and individual development require facet-level data that shorter instruments structurally cannot provide. For the science behind why this matters, see personality testing: open source vs commercial and social desirability bias in personality tests — longer instruments also provide more opportunities to include reverse-coded items that protect against acquiescence and social desirability inflation.

The appropriate length for a personality instrument is not determined by convention or by what feels convenient. It is determined by the use case, the required reliability, and the level of specificity the data needs to provide. For individual and team development, the evidence consistently supports instruments in the 100–120 item range as the practical optimum.

Why Cèrcol uses 120 items instead of 10

A 10-item personality test is better than no test — but for the purposes most teams care about (role fit, development planning, conflict prediction, coaching), 10 items per dimension is not enough. Two items cannot distinguish between facets, cannot reliably classify people in the middle of the distribution, and produce measurement error large enough to change conclusions on retest.

Cèrcol uses 120 items because that is the shortest instrument length that delivers full facet resolution and test-retest reliability above 0.87 across all five Big Five dimensions. The items are drawn from the open-domain IPIP item bank — the same scientific source used in hundreds of peer-reviewed studies. Completion takes about 15 minutes.

If you want to see what facet-level Big Five data actually looks like for your team, the assessment is free at cercol.team. The Witness peer assessment adds observer-rated profiles for each person — a second perspective that no self-report instrument, however long, can substitute for. Read the full measurement rationale at cercol.team/science.

Further reading: What reliability and validity mean in personality testing · The science behind Cèrcol

Why 120 items is better than 10: the trade-off in personality test length

The Spearman-Brown Formula: Why Test Length Predicts Big Five Reliability

What 10-Item Big Five Tests Miss That Longer Instruments Capture

TIPI vs IPIP-NEO-120: Reliability Trade-Offs Side by Side

When a Short Personality Test Is Actually Appropriate

When Test Length Matters: Individual Development and Team Profiling

Why Cèrcol Uses 120 Items to Balance Reliability and Completion Time

Why Cèrcol uses 120 items instead of 10

Further reading

Related articles

What reliability and validity mean in personality testing — explained plainly

What is a facet? The 30 Big Five facets

How personality test scores are calculated: from items to dimensions