Is Cèrcol based on the Big Five?

Yes. Cèrcol measures personality using the OCEAN model (Big Five) via the IPIP public-domain item pool (Goldberg et al. 2006). The 12 team roles are derived from the AB5C circumplex (Hofstee et al. 1992) and team composition research (Bell 2007; Neuman & Wright 1999).

What makes Cèrcol different from Belbin or DISC?

Cèrcol's roles are grounded in the Big Five (OCEAN) personality model using the IPIP public-domain item pool. The scoring pipeline is fully open source and auditable. Witness Cèrcol uses forced-choice adjective selection — not Likert scales — to eliminate social desirability bias in peer assessment. Unlike Belbin or DISC, all items are public domain and the entire methodology is published and citable.

Is the personality assessment free?

The New Moon Cèrcol (10 items, Big Five snapshot) and First Quarter Cèrcol (60 items, IPIP-NEO-60, 30 facets) are always free — no account required. The Full Moon Cèrcol (120 items, IPIP-NEO-120, Witness peer assessment, cognitive ability measure) requires a one-time payment.

What is Witness Cèrcol?

Witness Cèrcol is a peer personality assessment where someone who knows you well rates you using a forced-choice adjective selection method — picking the best-fit and worst-fit adjective per round from a set covering all five OCEAN dimensions. Forced choice eliminates the social desirability bias that affects standard Likert-scale peer ratings. Dimensions where your self-rating and peer ratings diverge by more than 0.8 standard deviations are flagged as potential blind spots.

How are the 12 team roles derived?

The 12 roles are derived from the AB5C circumplex (Hofstee, De Raad & Goldberg 1992), covering all six intersections of the three team balance dimensions (Presence/Extraversion × Bond/Agreeableness × Vision/Openness) at both poles. The selection of these three dimensions as requiring team-level balance is grounded in Bell (2007) and Neuman & Wright (1999). Discipline (Conscientiousness) and Depth (Neuroticism) modulate role expression but do not define team balance.

No account is required for any instrument. During assessment, no personal data is collected — only anonymous scores are logged. Data is stored on our own servers (Hetzner Online GmbH). No third-party analytics. No data is shared with or sold to third parties.

Is Cèrcol based on the Big Five (OCEAN)?

Yes. Cèrcol measures personality using the OCEAN model (Big Five) via the IPIP — the International Personality Item Pool, a public-domain collection validated in thousands of published studies. The five dimensions are Presence (Extraversion), Bond (Agreeableness), Vision (Openness), Discipline (Conscientiousness), and Depth (Neuroticism). Because the IPIP is public domain there are no licence restrictions: the full item pool and scoring logic are open and citable.

How is Cèrcol different from Belbin, DISC, or StrengthsFinder?

Three things set Cèrcol apart. First, the items come from the Big Five (OCEAN), the most replicated personality model in academic research — not a proprietary framework. Second, the full item pool (IPIP) and scoring pipeline are public domain and auditable; there is no black box. Third, the Witness peer assessment uses forced-choice adjective selection instead of Likert scales, which eliminates the social desirability bias that affects most 360-feedback tools. Belbin and DISC use closed, proprietary methodologies.

What are blind spots in team personality assessment?

A blind spot is a personality dimension where how you see yourself and how others see you diverge significantly — more than 0.8 standard deviations apart. Cèrcol's Witness peer assessment detects blind spots by comparing your self-report with forced-choice adjective ratings from people who know you. Blind spots are neither good nor bad: they show where your self-perception and others' experience of you don't match, which is often more actionable than the score itself.

How personality test scores are calculated: from items to dimensions

Step 1: How Big Five Item Response Formats Shape Your Score

The raw material of a personality score is the response to individual items. The most common format in Big Five assessment is the Likert scale: respondents rate their agreement with a statement — typically "Strongly Disagree / Disagree / Neutral / Agree / Strongly Agree" — usually on a five- or seven-point scale. See Likert scale — Wikipedia for the full statistical background.

Likert formats have several psychometric advantages. They are sensitive to gradations of agreement rather than forcing a binary yes/no response, which increases score variance and therefore reliability. They are familiar to most respondents, reducing the cognitive overhead of the response task. And they produce interval-like data that can be subjected to standard statistical analysis.

Alternative formats exist and each makes different assumptions:

Forced-choice formats present pairs or groups of trait-relevant statements and ask the respondent to choose which is more like them. This design was developed to reduce the impact of social desirability responding — the tendency to endorse statements that seem positively valued regardless of whether they are accurate. Forced choice makes it harder to present an idealised self-image because choosing one positive statement necessarily means rejecting another. The trade-off is ipsative measurement, discussed below. For a full treatment, see forced-choice personality assessment: why it produces more honest data.

Adjective rating formats present single personality-relevant words ("organised," "spontaneous," "anxious") and ask how well each describes the respondent. These are faster to administer than full-sentence items and show reasonable validity, but they tend to have lower reliability than full-statement Likert scales — partly because single words are more ambiguous than full sentences.

Step 2: Why Reverse-Scored Items Protect Big Five Scale Validity

A well-designed personality scale includes both positively keyed and negatively keyed items — that is, some items where agreement indicates the high end of the trait, and others where agreement indicates the low end. An item like "I keep my belongings neatly organised" is positively keyed for Conscientiousness; "I often leave tasks unfinished" is negatively keyed.

Negatively keyed items serve two purposes. First, they reduce the impact of acquiescence bias — the tendency for some respondents to agree with statements regardless of their content. If every item in a Conscientiousness scale is worded in the same direction, a person who says "agree" to everything will appear highly conscientious even if their actual behaviour is not. Negatively keyed items mean that consistently agreeable responding produces a middling score rather than a falsely high one. For a detailed explanation of how acquiescence and social desirability distort scores, see social desirability bias in personality tests.

Before items are aggregated into a dimension score, negatively keyed items are reverse scored: a response of 5 on a 1–5 scale is recoded as 1, a 4 becomes a 2, a 3 stays at 3, and so on. After reverse scoring, all items point in the same direction, and simple summation or averaging produces a coherent scale score.

"Reverse scoring is not a trick. It is a measurement safeguard — a design feature that protects the validity of scale scores against systematic responding styles that would otherwise produce misleading results. An instrument without negatively keyed items should be treated with caution."

Step 3: Sum Scoring vs Item Response Theory in Big Five Assessment

Once items are scored in the same direction, they must be combined into a dimension score. The two main approaches are classical test theory (CTT) sum scoring and item response theory (IRT).

Sum scoring is exactly what it sounds like: add up (or average) the item scores. If a Conscientiousness scale contains 20 items rated 1–5, the sum can range from 20 to 100. This raw sum is then typically standardised against a normative sample to produce a percentile or standardised score. Sum scoring is easy to implement, easy to explain, and adequate for most purposes.

Item response theory (IRT) takes a more sophisticated approach. IRT models the probability of each response option as a function of the respondent's latent trait level. Items are not treated as equivalent — some items are more discriminating (better at distinguishing between people at different trait levels), and some items are more informative at different points on the trait distribution. IRT scoring weights items by their discriminating power and can produce more precise estimates at the extremes of the distribution, where sum scoring tends to be less reliable.

For most applied purposes — team development, individual coaching, self-understanding — the practical difference between CTT sum scoring and IRT is small. Where IRT offers a clear advantage is in adaptive testing (selecting which items to administer based on earlier responses, which allows shorter tests with equivalent precision) and in high-stakes contexts where measurement precision at the extremes of the distribution matters. For more on why test length interacts with these calculations, see why 120 items is better than 10: personality test length.

Step 4: Normative vs Ipsative Scoring — and Why It Changes Everything

This is perhaps the least understood distinction in personality test scoring — and one of the most consequential.

Normative scoring compares each respondent's score to a reference population (the normative sample). A raw sum of 78 on a Conscientiousness scale means nothing until you know that the average person in the normative sample scores 65 and the standard deviation is 12 — which means a score of 78 is about one standard deviation above the mean, or roughly the 84th percentile. Normative scores answer the question: how does this person compare to others?

Ipsative scoring produces relative scores — comparisons of the respondent's own standing on different traits to each other, rather than comparisons to other people. Forced-choice formats naturally produce ipsative data: if a respondent consistently chose Conscientiousness-relevant statements over Agreeableness-relevant ones, they will end up with a relatively high Conscientiousness score and a relatively low Agreeableness score — but the scores are defined relative to each other, not relative to a population.

The psychometric literature is clear that ipsative scores are appropriate for understanding within-person priority orderings but are inappropriate for comparing people to each other or for predicting outcomes in criterion validity studies. Using ipsative scores to compare candidates in a hiring decision is a methodological error — because a candidate who scores high on Conscientiousness ipsatively might have lower absolute Conscientiousness than another candidate whose ipsative score is middling. For the hiring-specific implications, see personality testing in hiring: what is legal and what is ethical.

Scoring method	How it works	Pros	Cons
Likert sum/average (CTT)	Sum or average item scores after reverse scoring	Simple, transparent, well-understood	Treats all items as equally informative
Item Response Theory (IRT)	Models probability of each response as a function of latent trait	More precise at distribution extremes; enables adaptive testing	More complex to implement and explain
Normative scoring	Compares raw score to reference population	Enables comparison across individuals; meaningful percentile ranks	Quality depends heavily on normative sample representativeness
Ipsative scoring	Ranks traits relative to each other within a person	Reduces social desirability responding; reveals within-person priorities	Invalid for between-person comparisons; cannot be used in criterion validity studies

Step 5: Why the Normative Database Shapes Your Big Five Percentile

A normative score is only as meaningful as the normative sample it is derived from. If the reference population used to produce a percentile score is systematically different from the person being assessed — different age, occupation, culture, education level — the percentile may be misleading.

A Conscientiousness score at the 75th percentile of a general adult population sample might translate to the 55th percentile of a highly educated professional population, where mean Conscientiousness tends to be higher. Using the wrong normative base produces scores that systematically misrepresent where a person stands relative to the comparison population that actually matters for the decision at hand.

Well-designed assessment platforms maintain separate normative samples for different populations — by occupation, by country, by age group — and apply the relevant norm to each assessment. Cèrcol uses normative scoring derived from IPIP validation samples, with ongoing data collection to develop norms relevant to the specific populations using the platform. For the full discussion of what reliability and validity mean in this context, see what is reliability and validity in personality testing.

How Cèrcol Scores Its Big Five Instrument

Cèrcol's instrument uses Likert-format items with mixed positive and negative keying, CTT sum scoring after reverse coding, and normative comparison against published IPIP validation samples. Dimension scores are standardised as percentile-equivalents, and facet scores are reported as standardised scores within each dimension. For a deep dive into what facets add to the picture that domain scores alone cannot provide, see what is a facet in personality psychology.

The Witness assessment applies the same scoring algorithm to observer responses, producing comparable dimension and facet scores that can be directly overlaid with self-report data. Score discrepancies between self and Witness are flagged in reports as potential blind spots — areas where self-perception and external perception diverge meaningfully. To understand why this peer layer matters, see why self-assessment alone isn't enough: peer personality feedback.

Understanding the scoring process does not change what the scores mean in practice. But it makes clear that personality scores are not mysterious outputs from an opaque machine. They are the result of explicit, auditable methodological choices — choices that, in Cèrcol's case, are grounded in published psychometric research and available for inspection in the science documentation.

For context on what these scores are based on and how to use them well, see what reliability and validity mean in personality testing and forced-choice personality assessment and why it produces more honest data.

How Cèrcol calculates your Big Five scores

Cèrcol's scoring is entirely transparent: Likert-format items, reverse coding where needed, CTT sum aggregation, and normative percentile conversion using published IPIP samples. There are no proprietary black-box algorithms. The Witness peer assessment layer applies the same logic to observer-rated adjective pairs and overlays the result on your self-report profile — surfacing the blind spots that no self-report instrument, however carefully scored, can detect on its own.

If you want to see this methodology in action, the full Big Five assessment is free at cercol.team. The Witness instrument adds peer perspectives using a forced-choice design that sidesteps the acquiescence and social desirability inflation that affects standard Likert scales. The science documentation details every scoring decision with references to the published psychometric literature.

Further reading: What reliability and validity mean in personality testing · Forced-choice personality assessment: more honest data

How personality test scores are calculated: from items to dimensions

Step 1: How Big Five Item Response Formats Shape Your Score

Step 2: Why Reverse-Scored Items Protect Big Five Scale Validity

Step 3: Sum Scoring vs Item Response Theory in Big Five Assessment

Step 4: Normative vs Ipsative Scoring — and Why It Changes Everything

Step 5: Why the Normative Database Shapes Your Big Five Percentile

How Cèrcol Scores Its Big Five Instrument

How Cèrcol calculates your Big Five scores

Further reading

Related articles

What reliability and validity mean in personality testing — explained plainly

Why 120 items is better than 10: the trade-off in personality test length

What is a facet? The 30 Big Five facets