Beta launch — 500 free Full Moon licences remaining. Help us find bugs.
Claim free access

How many peer assessors do you need for reliable personality data?

Three Witness assessors reach .62 reliability; five reach .73. The Spearman-Brown formula shows exactly when adding more Witnesses stops improving your data.

Miquel Matoses·12 min read

How many peer assessors do you need for reliable personality data?

Peer personality assessment has a fundamental limitation that is easy to overlook: any single Witness's rating of another person is quite noisy. Individual people perceive each other imperfectly, see behaviour in limited contexts, bring their own biases and blind spots, and are influenced by how much they like the person they are rating. A single peer rating is valuable — but not as valuable as it might appear.

The question of how many Witnesses you need before the composite rating becomes reliably informative is one of the most practically important questions in personality assessment design. The answer comes from psychometric theory and from decades of empirical research on inter-rater reliability. Understanding it will help you use Cèrcol's Witness data appropriately and set realistic expectations about what different numbers of Witnesses can tell you.

For context on why peer data matters at all, see what the Cèrcol Witness instrument measures — and for the specific dimension-by-dimension picture of where self-other gaps are largest, see self-other agreement by Big Five dimension.

Why a Single Peer Rating Is Too Unreliable to Trust

When researchers examine the reliability of individual peer ratings — either by looking at consistency across occasions, or by correlating two independent peers' ratings of the same target — they consistently find correlations in the range of .30–.40 for Big Five dimensions.

This is not particularly high. A correlation of .35 means that only about 12% of the variance in one peer's rating is shared with another peer's rating of the same person. That leaves 88% of variance unaccounted for — some of which is genuine measurement error, some of which reflects different relationship contexts (a work colleague sees different behaviours than a close friend), and some of which reflects genuine disagreement about the target's personality.

For the purposes of individual-level assessment — making meaningful statements about a specific person's personality based on peer data — a single Witness rating is insufficient. It is suggestive at best.

"Inter-rater reliability for personality ratings by acquaintances typically falls around .35–.45, indicating that substantial aggregation is needed to achieve reliable composite estimates."
— See: Inter-rater reliability; and Connelly, B. S., & Ones, D. S. (2010). An other perspective on personality. Psychological Bulletin, 136(6), 1092–1122.

The Spearman-Brown Formula: How More Witnesses Raise Reliability

The psychometric principle that governs how reliability increases with the number of raters is the Spearman-Brown prediction formula. It states that if you know the reliability of a single rater, you can predict the reliability of the average of k raters:

r_k = (k × r_1) / (1 + (k − 1) × r_1)

Where r_1 is the inter-rater reliability with a single rater and k is the number of raters.

This formula predicts diminishing returns: adding your first additional Witness adds more reliability than adding your tenth. The curve flattens as you add more raters, and beyond a certain point, additional Witnesses contribute negligibly to the reliability of the composite.

Starting from a typical single-rater reliability of r = .35, the Spearman-Brown formula gives us the following predictions:

Reliability (r) Number of raters 0.0 0.3 0.6 0.9 r=0.30 r=0.62 r=0.75 r=0.83 minimum threshold 1 2 3 4 5 6 7 8
Composite reliability (Spearman-Brown) vs number of peer raters, starting from single-rater r = 0.30. The curve rises steeply from 1 to 3 raters, then flattens. The dashed red line marks 3 raters — the practical minimum threshold for meaningful interpretation.

Reliability by Number of Witnesses: From 3 to 12+

Number of Witnesses Expected Composite Reliability (r) Practical Interpretation
1 .35 Too noisy for individual conclusions; treat as weak signal
2 .52 Moderate — useful for identifying strong patterns only
3 .62 Acceptable — meaningful at the level of major tendencies
5 .73 Good — reliable enough for development use
7 .79 Good to very good — meaningful for most applied purposes
10 .84 Very good — solid for high-stakes development contexts
12 .87 Excellent — approaching ceiling of useful improvement
15 .89 Marginal gain over 12; rarely worth the additional effort
20 .92 Diminishing returns fully in effect

The practical message from this table is clear: three to five Witnesses produce a composite that is meaningfully more reliable than a single rating, and five to twelve Witnesses are sufficient for most development and coaching applications. Beyond twelve, the marginal gain per additional Witness is small enough that it rarely justifies the burden on Witnesses or the administrative complexity.

What 'Reliable' Actually Means for Peer Personality Data

A reliability of .73 (five Witnesses) means that about 73% of the variance in the composite peer rating is systematic — it reflects something real about the target person — while 27% is noise. For a development context, where the goal is to identify broad patterns and areas for reflection rather than to make high-stakes selection decisions, this is sufficient. You can reasonably interpret a composite that shows clearly higher Presence than Depth as reflecting a genuine pattern in how the target is perceived.

A reliability of .84 (ten Witnesses) approaches the reliability of many well-validated self-report measures. At this level, you can make more refined comparisons — for example, a small gap between Bond and Discipline is more likely to be meaningful than with five Witnesses, where such small differences might be noise.

Below three Witnesses, interpret the composite with significant caution. Two Witnesses at .52 reliability means nearly half the composite variance is noise. This does not mean the data is worthless — a strong, consistent pattern across two Witnesses is still informative — but it should be treated as hypothesis-generating rather than definitive.

For a broader treatment of what reliability and validity mean in personality testing generally, see what is reliability and validity in personality testing?

Getting the Most from Just 2–3 Witness Assessments

In practice, collecting ten or more Witness ratings is not always feasible. People have limited networks of close colleagues, social norms around requesting peer feedback vary, and many users of Cèrcol will conduct their /first-quarter assessment with only two or three available Witnesses.

When you have limited Witnesses, the right approach is to adjust your interpretation accordingly:

  • Focus on strong signals, not small differences. With two or three Witnesses, only substantial differences in the composite profile — a full standard deviation or more between dimensions — are likely to be reliable. Small gaps between dimensions should be treated as noise.
  • Look for consistency across Witnesses. If both (or all three) of your Witnesses independently score you similarly on a dimension, that convergence is informative even with a small sample. Divergence between Witnesses — one rates you high on Presence, another rates you low — is a signal to explore, not to average away.
  • Compare to self-report, not to norms. With limited Witnesses, the most meaningful comparison is between your self-report profile and your Witness composite. Where do they agree? Where do they diverge? Even with noisy Witness data, consistent divergences from self-ratings are worth exploring. The dimension-by-dimension pattern of these gaps is described in self-other agreement by Big Five dimension.
  • Add Witnesses over time. Cèrcol is designed to support longitudinal use. A first-quarter assessment with three Witnesses, repeated with three different Witnesses six months later, gives you a richer picture than a single snapshot — even if neither snapshot alone is highly reliable.

Why Witness Relationship Diversity Matters as Much as Count

The Spearman-Brown formula assumes that additional raters are independent and roughly equivalent in their perspective. In practice, the diversity of relationships matters as much as the number of Witnesses.

Five close friends who all know you in similar social contexts will produce a more redundant composite than five Witnesses who know you in different contexts: a manager, a peer, a direct report, a close friend, and a family member. Contextual diversity adds information that simple aggregation of similar relationships does not capture.

For the purposes of reliability calculation, this is a complication — Witnesses from different contexts may show lower inter-rater reliability precisely because they are seeing genuinely different aspects of the target's personality expression. This is a feature, not a bug: it means the composite captures a richer, more context-spanning picture. But it also means that simple reliability estimates may understate the value of contextually diverse Witness panels.

Cèrcol's peer feedback framework encourages users to select Witnesses from multiple relationship types for this reason. Once you have your Witness data in hand, using Cèrcol for team development: a practical guide walks through how to use it in a facilitated team context.

Cèrcol Witness: Practical Recommendations by Context

Based on the psychometric literature and the practical constraints of typical Cèrcol users, the following guidance applies:

  • Minimum for meaningful use: 3 Witnesses. Below this, results are too noisy for confident interpretation.
  • Target for standard development use: 5–7 Witnesses. This produces composite reliability of .73–.79, sufficient for identifying genuine patterns in how you are perceived.
  • High-stakes or coaching contexts: 8–12 Witnesses. For leadership development, executive coaching, or any context where the personality data will be used to make significant development decisions, ten or more Witnesses produces the most reliable composite.
  • Beyond 12: diminishing returns. The incremental reliability gain from additional Witnesses beyond 12 is small enough that the additional burden on Witnesses is rarely justified.

Summary: The Right Number of Witnesses for Your Use Case

A single peer personality rating has inter-rater reliability of approximately .35 — too low for confident individual-level interpretation. The Spearman-Brown aggregation theorem predicts how composite reliability increases with additional Witnesses, reaching acceptable levels (.73+) at five raters and very good levels (.84+) at ten. Beyond twelve Witnesses, returns diminish sharply. In practice, three Witnesses is the minimum for meaningful use; five to seven is the practical target; ten or more is ideal for high-stakes applications. When Witness numbers are limited, focus on strong signals, look for cross-Witness consistency, and use the self-other comparison as the primary interpretive lens.


References
Connelly, B. S., & Ones, D. S. (2010). An other perspective on personality: Meta-analytic integration of observers' accuracy and predictive validity. Psychological Bulletin, 136(6), 1092–1122.
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420–428.

Take the Cèrcol assessment now — free

Everything described in this article applies to your own Witness data. Go to cercol.team, take the free personality assessment, and invite at least three colleagues to act as Witnesses. Cèrcol displays confidence intervals that widen with fewer Witnesses, so you can see exactly how the reliability changes with each additional rater. The Witness instrument is free, takes colleagues under five minutes to complete, and the composite is available as soon as your third Witness responds.

Further reading

Related articles

Cèrcol uses only functional cookies — no analytics, no advertising trackers. Privacy policy