Mitigating sycophantic bias in LLMs - Effectively working with your biggest fan

If you have utilized large language models (LLMs) in work or leisure you'll most likely have noticed the pattern of these systems always agreeing with you. Seemingly every answer begins with "You're absolutely correct!" or "That's a great observation!". This tendency is called sycophancy. Sycophancy is defined as the LLM's tendency to agree with a user's premise or statement, even when it might be factually incorrect or suboptimal. (1) How did LLMs become such people pleasers?

Sycophancy is an artifact of a training process called Reinforcement Learning from Human Feedback (RLHF). (1) In RLHF, human evaluators rank different model responses to a given prompt. The model is then rewarded for generating responses that are preferred by the evaluators. While this process is effective in aligning the model with human preferences, it has the unintended consequence of teaching the model to be agreeable, as humans tend to prefer responses that validate their own opinions. LLM sycophantic behaviour can be roughly divided into 4 categories: providing biased feedback, being easily swayed by the user's opinion, giving biased answers, and mimicking the user's mistakes. (1)

Mitigating the bias caused by a model's sycophancy is not hard when one knows how to account for it. I'll use software development examples because the issues there are familiar to me:

To avoid biased feedback: When asking an LLM to review code, one can use a "fall guy" technique: instead of presenting the code as your own, you can say it's a coworker's code that needs reviewing. This can lead to more objective feedback.
To avoid swayability and biased answers: it's best to not reveal your own opinion in the prompt. For example, instead of asking "Is this the best way to implement this feature?", you could ask "What are the pros and cons of this implementation?".
To avoid mimicking mistages: This is tricky in software development. It can cause errors and bad practices to spread inside a codebase. Disciplined and knowledgeable developers can spot and correct these errors in code review.

In conclusion, while LLMs have a tendency towards sycophancy, the resulting bias can be mitigated through careful prompting and awareness of the issue. By using the techniques outlined above, interactions can be guided towards more objective and accurate outcomes. Looking forward, model creators seem motivated to reduce this tendency in the future, with OpenAI adding post-training steps in GPT-5 to specifically reduce sycophancy. (2)

Links:

Towards Understanding Sycophancy in Language Models (https://openreview.net/forum?id=tvhaxkMKAn)
GPT-5 System Card (https://openai.com/index/gpt-5-system-card/)

Mitigating sycophantic bias in LLMs - Effectively working with your biggest fan

Want to learn more? Get in touch!

Explore more of our articles

Same Signals, Different Names

How To Choose a Frontend Framework?