A very important part of the question bank development is validation of the questions. AI is known to hallucinate so even though I am often giving Claude an article to pull from to create questions and Claude connects with PubMed, there’s still a chance of hallucination and in medicine, that’s not acceptable.
After the questions are created, the first thing I do is to run all of the questions and answers through Claude in a separate “validation” project where I have given instructions on how to evaluate and review the questions. Claude reviews for level appropriateness (rheumatology fellows) as well as medical accuracy and confirms the references.
I then answer the questions myself to evaluate how easy or hard the questions are. Some are intense and inappropriate for the fellow level, some are far too easy. Most fall in the reasonable range. I then run a number of questions through Open Evidence and review the literature myself depending on the question.
In terms of the common mistakes with this method, initially I did see a lot of issues with the distractor questions - the answer was easy to guess even without the knowledge due to the variation in the answer choices. The connection with PubMed helped that problem immensely.
Another thing I frequently see is too much certainty. Rheumatology is often fraught with uncertainty. There were some claims of something being “pathognomonic” for a disease when it just clearly wasn’t. It will also pull out exact numbers from an article. For example, “tocilizumab decreases statin levels by 61%” - that might have been found in one paper but there is a range and these exact numbers aren’t useful to learners.
And one last final hysterical reason to do an extensive validation process is you never know what you might find when working with AI. I have asked Claude to avoid medical buzzwords. Though I think it’s important to know the buzzwords, I want learners not to have to rely on them in case they are described slightly differently. In one instance, Claude described “velcro crackles” as “dry inspiratory sounds resembling hook-and-loop fasteners”. Claude was quite proud of avoiding medical buzzwords and I hope it amuses you as much as it did me.
Status update:
Question bank: 433 validated
Flashcards: 165 validated
New categories added yesterday: Pediatrics; Osteoarthritis & Regional MSK
Happy Learning!