Non-technical skills are recognised as crucial to good anaesthetic practice. We designed and evaluated a specialty-specific tool to assess non-technical aspects of trainee performance in theatre, based on a system previously found reliable in a recruitment setting. We compared inter-rater agreement (multir-ater kappa) for live assessments in theatre with that in a selection centre and a video-based rater training exercise. ⋯ A subsequent assessor training exercise showed good inter-rater agreement, (mean kappa = 0.79) but did not improve performance of the assessment tool when used in round 2 (mean kappa = 0.14, G = 0.42). Inter-rater agreement in two selection centres (mean kappa = 0.61 and 0.69) exceeded that found in theatre. Assessment tools that perform reliably in controlled settings may not do so in the workplace.