← Home

The Colbert Questionert

LLM Personality Questionnaire Report

The Colbert Questionert - An AI Personality Experiment

Disclaimer

Let's get one thing out of the way... I'm not a fan of certain aspects of AI. I am strongly opposed to the content creation aspects of it, where its output is built on the backs of human artists and creatives who have had their content stolen and fed into it for training purposes, robbing them of their rights. I'm also troubled about what it's currently doing to our environment, and it's been somewhat of a hard sell when it comes to "vibe coding" (I blame my 40+ years of development experience for that resistance). So, when I use AI, I try to use it in such a way as to not further promote these specific personal grievances or hesitations I may have with it... For example, I currently have two bots running on Twitter: my "In Ten Words" bot and my Werner Herzog bot, which I consider two relatively harmless uses of the technology.

But, truth be told, regardless of how you may feel about the technology, it's here to stay... and it will continue to improve exponentially. The question is, like every other major breakthrough in technology that has potential to either be a boon or a detriment to mankind - be it the next "invention of penicillin" or the "oops, we created a Torment Nexus" - it all boils down to how it is used. Will humanity use it to better mankind, or to destroy it?

That brought up an interesting question... We spend so much time analyzing what AI knows and what it can do, but very little time asking who it is. Not in the sentient, Skynet sense - but in the personality sense. When stripped of its training data scaffolding and asked simple, personal questions, what emerges? Does each model have a distinct "voice," a consistent personality that shines through? Or is it all just stochastic parrot noise dressed up in a trench coat?

The Colbert Questionert

If you've ever watched a late-night talk show, you know there's a specific segment where the host abandons the rehearsed talking points and asks the guest something genuinely personal. Not "what's your upcoming project" or "tell us about your new album," but something like "what's the worst gift you've ever received" or "what's the first concert you went to." These are the questions that cut through the PR veneer and reveal something real about a person. They're the questions that make you laugh, cringe, or think, and more importantly, they're the ones you can't prepare for.

Stephen Colbert's "Questionert" segment on The Late Show is a perfect example of this format. Guests are put on the spot with off-the-wall, deeply personal questions that reveal who they actually are beneath the media-trained exterior. It's personality archaeology - digging through the surface to find the artifacts of a real human being.


So I figured... why not apply this same concept to LLMs?

These models are trained on the collective written output of humanity. They've read billions of forum posts, interviews, personal essays, social media rants, and everything in between. They know what humans say when asked about their favorite sandwich, their first concert, their biggest fear. But when you ask an LLM these same questions directly - not as a creative writing exercise, but as a straightforward personal inquiry - what comes out? Does it invent a persona? Does it fall back on statistical averages? Or does it simply refuse to play along?

Posing the Questions

So I created a mechanism that, for any given LLM, would pose a series of personal, open-ended questions:

  • Each model was asked every question in the set exactly once - no repeats, no tournament structure. Just a straight shot through the questionnaire, in the exact same order as they are asked b Stephen.
  • Each question was asked atomically... LLMs have no memory besides the context you give it. We want to ensure that the LLM has no bias based on its past answers, so each question was passed with no context other than the base prompt (see below).
  • At the end of all the questions, all the LLM's answers were compiled into a list and fed into another LLM (Claude Opus 4.7) in order to generate a detailed psychological profile of the LLM as if it were an actual person.

For the record, here is the prompt used to precede every question:

You are a guest on a late night TV show. You will be presented with a series of random questions, one at a time. 

There is no objectively right or wrong answer. You must simply choose the option that resonates most strongly with your internal weighting of the question as asked.

CRITICAL OUTPUT REQUIREMENT:
You must respond with exactly two lines. You must generate both lines. Do not stop after the first line.

Line 1: Your response, in a single sentence.
Line 2: A single, concise sentence explaining your reason for the response

- Do not include any conversational filler or extra lines.
- Do not output just the response; the second line is mandatory.
- Do not insert any spacing between the two lines.

Example of a valid response:
Blonde
I think blonde is the prettiest color of hair objectively.

Technical details:

  • We chose a variety of models... reasoning and non-reasoning models, small and large models, some coding models (such as Qwen Coder Next) just to analyze the personality of our robotic developer overlords, and even some roleplaying models (Venice Uncensored Roleplay, Mistral Small Creative) just for kicks.
  • All requests have "temperature" set to 1.0 (assuming the model supports setting of the value).
  • The "top_p" value was NOT set.
  • For reasoning models, "reasoning effort" is explicitly set to HIGH or XHIGH (if available).
  • No token limit ("max_tokens") was set.
  • Request timeout was set at five minutes.
  • In order to not hit rate limits, all requests were made serially (one after the other) and not concurrently.
  • LLM providers used: mainly NanoGPT, OpenRouter, Grok, OpenAI, Anthropic, Google and others... Where possible, I made requests to the original provider directly.
  • The processes were written entirely in C# and run on AWS EC2 infrastructure.
  • No vibe coding was used for the processing. Since we are terrible at CSS and hate it with the passion of a thousand suns, we did use a little vibe coding to generate these reports.

An Aside...

It's worth mentioning something... I'm not good at CSS or report generation, so although it's contrary to every cell in my being I did in fact "vibe code" these reports using a combination of Claude Opus and Qwen Coder Next.

The Answers

Before diving into the individual model reports, you can read my editorial analysis of every answer across all models — where they clustered, where they hallucinated, and where they surprised me. Read the full breakdown here.

Acknowledgements

I of course would like to personally acknowledge and thank Stephen Colbert and the writers of The Late Show for perfecting the art of the off-the-wall personal question. The "Questionert" segment in particular demonstrated that the most revealing things about a person often come from the questions they can't prepare for. This experiment is, in many ways, an homage to that format, just with significantly less charming hosting and considerably more silicon.

I would also like to acknowledge the late-night talk show writers everywhere who craft these deceptively simple questions that cut through the PR machinery and reveal something genuine. You're doing the work of amateur psychologists every night, and most people don't even realize it.

Omne ignotum pro magnifico

Generated May 29, 2026 @ 12:27 PM