Is Cèrcol based on the Big Five?

Yes. Cèrcol measures personality using the OCEAN model (Big Five) via the IPIP public-domain item pool (Goldberg et al. 2006). The 12 team roles are derived from the AB5C circumplex (Hofstee et al. 1992) and team composition research (Bell 2007; Neuman & Wright 1999).

What makes Cèrcol different from Belbin or DISC?

Cèrcol's roles are grounded in the Big Five (OCEAN) personality model using the IPIP public-domain item pool. The scoring pipeline is fully open source and auditable. Witness Cèrcol uses forced-choice adjective selection — not Likert scales — to eliminate social desirability bias in peer assessment. Unlike Belbin or DISC, all items are public domain and the entire methodology is published and citable.

Is the personality assessment free?

The New Moon Cèrcol (10 items, Big Five snapshot) and First Quarter Cèrcol (60 items, IPIP-NEO-60, 30 facets) are always free — no account required. The Full Moon Cèrcol (120 items, IPIP-NEO-120, Witness peer assessment, cognitive ability measure) requires a one-time payment.

What is Witness Cèrcol?

Witness Cèrcol is a peer personality assessment where someone who knows you well rates you using a forced-choice adjective selection method — picking the best-fit and worst-fit adjective per round from a set covering all five OCEAN dimensions. Forced choice eliminates the social desirability bias that affects standard Likert-scale peer ratings. Dimensions where your self-rating and peer ratings diverge by more than 0.8 standard deviations are flagged as potential blind spots.

How are the 12 team roles derived?

The 12 roles are derived from the AB5C circumplex (Hofstee, De Raad & Goldberg 1992), covering all six intersections of the three team balance dimensions (Presence/Extraversion × Bond/Agreeableness × Vision/Openness) at both poles. The selection of these three dimensions as requiring team-level balance is grounded in Bell (2007) and Neuman & Wright (1999). Discipline (Conscientiousness) and Depth (Neuroticism) modulate role expression but do not define team balance.

No account is required for any instrument. During assessment, no personal data is collected — only anonymous scores are logged. Data is stored on our own servers (Hetzner Online GmbH). No third-party analytics. No data is shared with or sold to third parties.

Is Cèrcol based on the Big Five (OCEAN)?

Yes. Cèrcol measures personality using the OCEAN model (Big Five) via the IPIP — the International Personality Item Pool, a public-domain collection validated in thousands of published studies. The five dimensions are Presence (Extraversion), Bond (Agreeableness), Vision (Openness), Discipline (Conscientiousness), and Depth (Neuroticism). Because the IPIP is public domain there are no licence restrictions: the full item pool and scoring logic are open and citable.

How is Cèrcol different from Belbin, DISC, or StrengthsFinder?

Three things set Cèrcol apart. First, the items come from the Big Five (OCEAN), the most replicated personality model in academic research — not a proprietary framework. Second, the full item pool (IPIP) and scoring pipeline are public domain and auditable; there is no black box. Third, the Witness peer assessment uses forced-choice adjective selection instead of Likert scales, which eliminates the social desirability bias that affects most 360-feedback tools. Belbin and DISC use closed, proprietary methodologies.

What are blind spots in team personality assessment?

A blind spot is a personality dimension where how you see yourself and how others see you diverge significantly — more than 0.8 standard deviations apart. Cèrcol's Witness peer assessment detects blind spots by comparing your self-report with forced-choice adjective ratings from people who know you. Blind spots are neither good nor bad: they show where your self-perception and others' experience of you don't match, which is often more actionable than the score itself.

What reliability and validity mean in personality testing

Reliability in Personality Testing: What It Means and What Scores to Demand

Reliability refers to the consistency of a measurement. A test is reliable if it produces the same — or closely similar — results under conditions where the underlying trait has not changed. There are two primary types.

Test-retest reliability

Test-retest reliability asks: if the same person takes the same test twice, a few weeks apart, how similar are the results? Scores can differ between administrations for two reasons: genuine change in the underlying trait, or measurement error. A reliable test minimises measurement error, so that score changes between administrations primarily reflect real change rather than noise.

The standard threshold for acceptable test-retest reliability is a correlation of approximately 0.70 or above over a two-to-four-week interval. Well-validated Big Five instruments typically achieve 0.80 or higher for domain-level scores. The MBTI's test-retest reliability is lower — studies have found that approximately 50 percent of respondents receive a different four-letter type classification when retested five weeks later, which is the statistical signature of high measurement error. See MBTI vs Big Five for the full comparison.

Internal consistency

Internal consistency reliability asks whether the items within a scale are measuring the same underlying construct. If a Conscientiousness scale contains items about organisation, diligence, and reliability, those items should correlate with each other — because they are all tapping the same underlying disposition. The standard statistic is Cronbach's alpha, where values above 0.70 are generally considered acceptable and above 0.80 are good.

Low internal consistency means items within a scale are measuring different things — which makes the total scale score difficult to interpret. A Conscientiousness score derived from items that barely correlate with each other is not a coherent measurement. For an explanation of how scale length interacts with internal consistency, see why 120 items is better than 10.

Validity in Personality Testing: Four Types Every Buyer Should Understand

Validity addresses a different question: is the test actually measuring what it purports to measure? A test can be perfectly consistent (reliable) while measuring the wrong thing entirely. The main forms of validity evidence each address a different aspect of this question.

Convergent validity

Convergent validity asks whether the test correlates with other established measures of the same construct. A new Extraversion scale should correlate positively with existing validated Extraversion measures — because if both are measuring Extraversion, they should agree on who has more and less of it.

This sounds obvious but is surprisingly often neglected. Many proprietary instruments report no convergent validity data, which makes it impossible to evaluate whether they are measuring the same constructs as the academic literature. The IPIP item bank was built precisely to enable this kind of public comparison.

Criterion validity

Criterion validity — the most practically important form — asks whether the test predicts outcomes that the trait should theoretically predict. If a Conscientiousness measure is valid, it should predict job performance, academic achievement, and goal attainment, because Conscientiousness is the trait most consistently linked to these outcomes in the literature. If a test claims to measure Conscientiousness but shows no correlation with job performance, something is wrong with the claim.

Predictive validity is a specific subtype: does the test predict future outcomes? Concurrent validity asks whether the test correlates with outcomes assessed at the same time. Both matter, but predictive validity is the gold standard for instruments used in personnel selection. For the implications for hiring specifically, see personality testing in hiring: what is legal and what is ethical.

Discriminant validity

Discriminant validity asks whether the test correlates too highly with measures of different constructs. If a scale purporting to measure Agreeableness correlates as strongly with Conscientiousness as it does with other Agreeableness measures, it may not be measuring Agreeableness distinctly — the two scales may be measuring much the same thing, which means the information is partially redundant. Understanding what each Big Five facet uniquely measures helps here — see what is a facet in personality psychology.

Face validity vs statistical validity

Face validity is the appearance of measuring what a test claims. An item that reads "I am an organised person" has high face validity for Conscientiousness — it looks like it is measuring organisation. But face validity is not the same as statistical validity, and conflating them is one of the most common errors in personality test evaluation.

Many popular instruments have high face validity and modest to poor statistical validity. The content looks relevant; the predictions are weak. For a breakdown of which popular tests fall into this trap, see the best free personality tests for teams in 2026.

Psychometric concept	What it measures	Good threshold	Big Five instruments	MBTI
Test-retest reliability	Consistency of scores over time	r ≥ 0.70 over 4 weeks	Typically 0.80–0.90	~0.50 (50% type change at retest)
Internal consistency (Cronbach's α)	Item coherence within a scale	α ≥ 0.70	Typically 0.80–0.90	Moderate; varies by scale
Convergent validity	Agreement with other measures of same trait	r ≥ 0.50 with established measure	Well-documented in peer review	Limited cross-instrument data published
Criterion validity	Prediction of real-world outcomes	Varies; d ≥ 0.20 considered meaningful	Conscientiousness predicts job performance robustly	Weak prediction of job performance
Discriminant validity	Independence from measures of different traits	Low r with conceptually distinct scales	Generally supported	Dimensions not clearly independent of each other

Five Questions to Evaluate Any Personality Test Validity Claim

When a vendor or researcher claims that a personality instrument is "valid and reliable," the following questions produce a fast quality assessment.

Question 1: Is the validity evidence published in peer-reviewed journals? Proprietary technical reports, white papers, and website copy do not count. Peer review subjects validity claims to independent scrutiny. If the only validity evidence is the publisher's own documentation, that is a red flag. The broader implications for how personality science handles replication are addressed in personality science: the replication crisis.

Question 2: What is the test-retest reliability over a clinically meaningful interval? Four to six weeks is standard. If this number is not reported or is below 0.70, the measurement is noisy.

Question 3: What outcomes does the instrument predict? Criterion validity evidence should include real-world outcomes, not just correlations with other self-report measures. For work-relevant instruments, job performance is the key criterion.

Question 4: Have independent research groups replicated the validity findings? A single study by the instrument's own developers is insufficient. Replication by researchers with no commercial interest in the outcome is the meaningful standard.

Question 5: Is the scoring transparent? If the scoring algorithm is proprietary, the validity claims cannot be independently verified. Open-science instruments — including the IPIP on which Cèrcol is built — allow anyone to check the claims against the data. See personality testing: open source vs commercial for the full comparison.

Why Peer Assessment Adds Validity That Self-Report Cannot Provide

One underappreciated source of validity in personality assessment is the use of observer ratings alongside self-report. Personality measured by people who know the subject — colleagues, managers, direct reports — typically shows higher criterion validity than self-report alone, particularly for predicting work performance.

This is because self-report is subject to impression management (consciously or unconsciously scoring oneself more favourably) and to limited self-knowledge (people are often unaware of how they come across to others). Observer ratings are not free of bias, but they are affected by different biases — which means combining self and observer data produces more accurate personality estimates than either alone. For the full argument, see why self-assessment alone isn't enough: peer personality feedback.

Cèrcol's Witness model is designed around this principle. The history of the Big Five and the science page provide further context on the validity evidence underpinning Cèrcol's design choices.

"Reliability and validity are not marketing claims. They are specific statistical properties with established thresholds, measurable through standard methods, and verifiable through published data. An instrument that cannot provide peer-reviewed evidence for both should be evaluated with proportionate scepticism."

How Cèrcol meets the reliability and validity bar

Cèrcol's instrument is built on the IPIP item bank — the same public-domain items whose psychometric properties have been independently documented by Goldberg and colleagues across decades of published research. Domain-level test-retest reliability for IPIP-based Big Five scales typically sits above r = 0.80 over four-week intervals. Internal consistency (Cronbach's α) for the 20-item per dimension scales Cèrcol uses is consistently above 0.87.

Criterion validity is inherited from the broader Big Five literature: Conscientiousness (Discipline) predicts job performance across all major occupational categories (Barrick & Mount, 1991, doi: 10.1111/j.1744-6570.1991.tb00688.x). Neuroticism (Depth) predicts stress response and wellbeing outcomes. Openness (Vision) predicts creative performance.

The Witness peer assessment adds observer-rated scores on the same five dimensions using a forced-choice format that reduces social desirability bias — see social desirability bias in personality tests for the full methodology. Take the free assessment at cercol.team and review the full validity documentation at cercol.team/science.

Further reading: The history of the Big Five: from Allport to Goldberg · The science behind Cèrcol

What reliability and validity mean in personality testing — explained plainly

Reliability in Personality Testing: What It Means and What Scores to Demand

Test-retest reliability

Internal consistency

Validity in Personality Testing: Four Types Every Buyer Should Understand

Convergent validity

Criterion validity

Discriminant validity

Face validity vs statistical validity

Five Questions to Evaluate Any Personality Test Validity Claim

Why Peer Assessment Adds Validity That Self-Report Cannot Provide

How Cèrcol meets the reliability and validity bar

Further reading

Related articles

Personality science and the replication crisis: what has held up?

Why 120 items is better than 10: the trade-off in personality test length

The history of the Big Five