Is Cèrcol based on the Big Five?

Yes. Cèrcol measures personality using the OCEAN model (Big Five) via the IPIP public-domain item pool (Goldberg et al. 2006). The 12 team roles are derived from the AB5C circumplex (Hofstee et al. 1992) and team composition research (Bell 2007; Neuman & Wright 1999).

What makes Cèrcol different from Belbin or DISC?

Cèrcol's roles are grounded in the Big Five (OCEAN) personality model using the IPIP public-domain item pool. The scoring pipeline is fully open source and auditable. Witness Cèrcol uses forced-choice adjective selection — not Likert scales — to eliminate social desirability bias in peer assessment. Unlike Belbin or DISC, all items are public domain and the entire methodology is published and citable.

Is the personality assessment free?

The New Moon Cèrcol (10 items, Big Five snapshot) and First Quarter Cèrcol (60 items, IPIP-NEO-60, 30 facets) are always free — no account required. The Full Moon Cèrcol (120 items, IPIP-NEO-120, Witness peer assessment, cognitive ability measure) requires a one-time payment.

What is Witness Cèrcol?

Witness Cèrcol is a peer personality assessment where someone who knows you well rates you using a forced-choice adjective selection method — picking the best-fit and worst-fit adjective per round from a set covering all five OCEAN dimensions. Forced choice eliminates the social desirability bias that affects standard Likert-scale peer ratings. Dimensions where your self-rating and peer ratings diverge by more than 0.8 standard deviations are flagged as potential blind spots.

How are the 12 team roles derived?

The 12 roles are derived from the AB5C circumplex (Hofstee, De Raad & Goldberg 1992), covering all six intersections of the three team balance dimensions (Presence/Extraversion × Bond/Agreeableness × Vision/Openness) at both poles. The selection of these three dimensions as requiring team-level balance is grounded in Bell (2007) and Neuman & Wright (1999). Discipline (Conscientiousness) and Depth (Neuroticism) modulate role expression but do not define team balance.

No account is required for any instrument. During assessment, no personal data is collected — only anonymous scores are logged. Data is stored on our own servers (Hetzner Online GmbH). No third-party analytics. No data is shared with or sold to third parties.

Is Cèrcol based on the Big Five (OCEAN)?

Yes. Cèrcol measures personality using the OCEAN model (Big Five) via the IPIP — the International Personality Item Pool, a public-domain collection validated in thousands of published studies. The five dimensions are Presence (Extraversion), Bond (Agreeableness), Vision (Openness), Discipline (Conscientiousness), and Depth (Neuroticism). Because the IPIP is public domain there are no licence restrictions: the full item pool and scoring logic are open and citable.

How is Cèrcol different from Belbin, DISC, or StrengthsFinder?

Three things set Cèrcol apart. First, the items come from the Big Five (OCEAN), the most replicated personality model in academic research — not a proprietary framework. Second, the full item pool (IPIP) and scoring pipeline are public domain and auditable; there is no black box. Third, the Witness peer assessment uses forced-choice adjective selection instead of Likert scales, which eliminates the social desirability bias that affects most 360-feedback tools. Belbin and DISC use closed, proprietary methodologies.

What are blind spots in team personality assessment?

A blind spot is a personality dimension where how you see yourself and how others see you diverge significantly — more than 0.8 standard deviations apart. Cèrcol's Witness peer assessment detects blind spots by comparing your self-report with forced-choice adjective ratings from people who know you. Blind spots are neither good nor bad: they show where your self-perception and others' experience of you don't match, which is often more actionable than the score itself.

How many peer assessors do you need for reliable personality data?

Why a Single Peer Rating Is Too Unreliable to Trust

When researchers examine the reliability of individual peer ratings — either by looking at consistency across occasions, or by correlating two independent peers' ratings of the same target — they consistently find correlations in the range of .30–.40 for Big Five dimensions.

This is not particularly high. A correlation of .35 means that only about 12% of the variance in one peer's rating is shared with another peer's rating of the same person. That leaves 88% of variance unaccounted for — some of which is genuine measurement error, some of which reflects different relationship contexts (a work colleague sees different behaviours than a close friend), and some of which reflects genuine disagreement about the target's personality.

For the purposes of individual-level assessment — making meaningful statements about a specific person's personality based on peer data — a single Witness rating is insufficient. It is suggestive at best.

"Inter-rater reliability for personality ratings by acquaintances typically falls around .35–.45, indicating that substantial aggregation is needed to achieve reliable composite estimates."
— See: Inter-rater reliability; and Connelly, B. S., & Ones, D. S. (2010). An other perspective on personality. Psychological Bulletin, 136(6), 1092–1122.

The Spearman-Brown Formula: How More Witnesses Raise Reliability

The psychometric principle that governs how reliability increases with the number of raters is the Spearman-Brown prediction formula. It states that if you know the reliability of a single rater, you can predict the reliability of the average of k raters:

r_k = (k × r_1) / (1 + (k − 1) × r_1)

Where r_1 is the inter-rater reliability with a single rater and k is the number of raters.

This formula predicts diminishing returns: adding your first additional Witness adds more reliability than adding your tenth. The curve flattens as you add more raters, and beyond a certain point, additional Witnesses contribute negligibly to the reliability of the composite.

Starting from a typical single-rater reliability of r = .35, the Spearman-Brown formula gives us the following predictions:

Composite reliability (Spearman-Brown) vs number of peer raters, starting from single-rater r = 0.30. The curve rises steeply from 1 to 3 raters, then flattens. The dashed red line marks 3 raters — the practical minimum threshold for meaningful interpretation.

Reliability by Number of Witnesses: From 3 to 12+

Number of Witnesses	Expected Composite Reliability (r)	Practical Interpretation
1	.35	Too noisy for individual conclusions; treat as weak signal
2	.52	Moderate — useful for identifying strong patterns only
3	.62	Acceptable — meaningful at the level of major tendencies
5	.73	Good — reliable enough for development use
7	.79	Good to very good — meaningful for most applied purposes
10	.84	Very good — solid for high-stakes development contexts
12	.87	Excellent — approaching ceiling of useful improvement
15	.89	Marginal gain over 12; rarely worth the additional effort
20	.92	Diminishing returns fully in effect

The practical message from this table is clear: three to five Witnesses produce a composite that is meaningfully more reliable than a single rating, and five to twelve Witnesses are sufficient for most development and coaching applications. Beyond twelve, the marginal gain per additional Witness is small enough that it rarely justifies the burden on Witnesses or the administrative complexity.

What 'Reliable' Actually Means for Peer Personality Data

A reliability of .73 (five Witnesses) means that about 73% of the variance in the composite peer rating is systematic — it reflects something real about the target person — while 27% is noise. For a development context, where the goal is to identify broad patterns and areas for reflection rather than to make high-stakes selection decisions, this is sufficient. You can reasonably interpret a composite that shows clearly higher Presence than Depth as reflecting a genuine pattern in how the target is perceived.

A reliability of .84 (ten Witnesses) approaches the reliability of many well-validated self-report measures. At this level, you can make more refined comparisons — for example, a small gap between Bond and Discipline is more likely to be meaningful than with five Witnesses, where such small differences might be noise.

Below three Witnesses, interpret the composite with significant caution. Two Witnesses at .52 reliability means nearly half the composite variance is noise. This does not mean the data is worthless — a strong, consistent pattern across two Witnesses is still informative — but it should be treated as hypothesis-generating rather than definitive.

For a broader treatment of what reliability and validity mean in personality testing generally, see what is reliability and validity in personality testing?

Getting the Most from Just 2–3 Witness Assessments

In practice, collecting ten or more Witness ratings is not always feasible. People have limited networks of close colleagues, social norms around requesting peer feedback vary, and many users of Cèrcol will conduct their /first-quarter assessment with only two or three available Witnesses.

When you have limited Witnesses, the right approach is to adjust your interpretation accordingly:

Focus on strong signals, not small differences. With two or three Witnesses, only substantial differences in the composite profile — a full standard deviation or more between dimensions — are likely to be reliable. Small gaps between dimensions should be treated as noise.
Look for consistency across Witnesses. If both (or all three) of your Witnesses independently score you similarly on a dimension, that convergence is informative even with a small sample. Divergence between Witnesses — one rates you high on Presence, another rates you low — is a signal to explore, not to average away.
Compare to self-report, not to norms. With limited Witnesses, the most meaningful comparison is between your self-report profile and your Witness composite. Where do they agree? Where do they diverge? Even with noisy Witness data, consistent divergences from self-ratings are worth exploring. The dimension-by-dimension pattern of these gaps is described in self-other agreement by Big Five dimension.
Add Witnesses over time. Cèrcol is designed to support longitudinal use. A first-quarter assessment with three Witnesses, repeated with three different Witnesses six months later, gives you a richer picture than a single snapshot — even if neither snapshot alone is highly reliable.

Why Witness Relationship Diversity Matters as Much as Count

The Spearman-Brown formula assumes that additional raters are independent and roughly equivalent in their perspective. In practice, the diversity of relationships matters as much as the number of Witnesses.

Five close friends who all know you in similar social contexts will produce a more redundant composite than five Witnesses who know you in different contexts: a manager, a peer, a direct report, a close friend, and a family member. Contextual diversity adds information that simple aggregation of similar relationships does not capture.

For the purposes of reliability calculation, this is a complication — Witnesses from different contexts may show lower inter-rater reliability precisely because they are seeing genuinely different aspects of the target's personality expression. This is a feature, not a bug: it means the composite captures a richer, more context-spanning picture. But it also means that simple reliability estimates may understate the value of contextually diverse Witness panels.

Cèrcol's peer feedback framework encourages users to select Witnesses from multiple relationship types for this reason. Once you have your Witness data in hand, using Cèrcol for team development: a practical guide walks through how to use it in a facilitated team context.

Cèrcol Witness: Practical Recommendations by Context

Based on the psychometric literature and the practical constraints of typical Cèrcol users, the following guidance applies:

Minimum for meaningful use: 3 Witnesses. Below this, results are too noisy for confident interpretation.
Target for standard development use: 5–7 Witnesses. This produces composite reliability of .73–.79, sufficient for identifying genuine patterns in how you are perceived.
High-stakes or coaching contexts: 8–12 Witnesses. For leadership development, executive coaching, or any context where the personality data will be used to make significant development decisions, ten or more Witnesses produces the most reliable composite.
Beyond 12: diminishing returns. The incremental reliability gain from additional Witnesses beyond 12 is small enough that the additional burden on Witnesses is rarely justified.

Summary: The Right Number of Witnesses for Your Use Case

A single peer personality rating has inter-rater reliability of approximately .35 — too low for confident individual-level interpretation. The Spearman-Brown aggregation theorem predicts how composite reliability increases with additional Witnesses, reaching acceptable levels (.73+) at five raters and very good levels (.84+) at ten. Beyond twelve Witnesses, returns diminish sharply. In practice, three Witnesses is the minimum for meaningful use; five to seven is the practical target; ten or more is ideal for high-stakes applications. When Witness numbers are limited, focus on strong signals, look for cross-Witness consistency, and use the self-other comparison as the primary interpretive lens.

References
Connelly, B. S., & Ones, D. S. (2010). An other perspective on personality: Meta-analytic integration of observers' accuracy and predictive validity. Psychological Bulletin, 136(6), 1092–1122.
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420–428.

Take the Cèrcol assessment now — free

Everything described in this article applies to your own Witness data. Go to cercol.team, take the free personality assessment, and invite at least three colleagues to act as Witnesses. Cèrcol displays confidence intervals that widen with fewer Witnesses, so you can see exactly how the reliability changes with each additional rater. The Witness instrument is free, takes colleagues under five minutes to complete, and the composite is available as soon as your third Witness responds.

How many peer assessors do you need for reliable personality data?

How many peer assessors do you need for reliable personality data?

Why a Single Peer Rating Is Too Unreliable to Trust

The Spearman-Brown Formula: How More Witnesses Raise Reliability

Reliability by Number of Witnesses: From 3 to 12+

What 'Reliable' Actually Means for Peer Personality Data

Getting the Most from Just 2–3 Witness Assessments

Why Witness Relationship Diversity Matters as Much as Count

Cèrcol Witness: Practical Recommendations by Context

Summary: The Right Number of Witnesses for Your Use Case

Take the Cèrcol assessment now — free

Further reading

Related articles

Anonymity in personality assessment: why it matters more than you think

What reliability and validity mean in personality testing — explained plainly

Why self-assessment alone isn't enough: the case for peer personality feedback