This is very funny, because Matt Kaeberlein in his recent podcast made various claims about ChatGPT in the context of interpreting your blood tests and various biomarkers and diagnostic results. He said that until recently it wasn’t great, but these days it’s “very good”. I thought about this quite a bit and reflected upon my recent experiences and I was not impressed. The caveat is that I was using Gemini and not the paid version of ChatGPT 5, so maybe things are different at more rarified levels of “reasoning”, but at least Gemini struck me as exceptionally poor. It was poor, because despite my specifically prompting it to use pubmed studies in its analysis, it frequently missed pivotal studies that directly addressed the question - mostly my questions were pretty simple, like likely interactions between certain drugs - telmisartan, empagliflozin, pitavastatin, pioglitazone - it was really terrible. I know that, because I have been researching these interactions for months, so I had the background to know where the AI went wrong - all despite a lot of prompt modification and guiding. I came away from the experience profoundly disappointed and deeply skeptical about the utility and reliability of AI in healthcare at least at present, summer-fall of 2025. No doubt there are areas where it’s extremely good, such as image interpretation, finding stuff in x-rays and the like, but when it comes to interpreting what is happening in the human body at a biochemical level, it’s completely hopeless. Perhaps it’ll get better one day, but for now I’ve got to disagree with MK - it’s not “very good”, it’s “very, VERY bad”. I wouldn’t dream of relying on AI for medical decisions atm. YMMV.

1 Like

I can put your questions into GPT5 if you wish.

I might take you up on that, stay tuned! Thank you, and much obliged.

If you want to do it off line and only share your conclusions we can do this via email and/or a zoom call.

1 Like

For programming according to George Hotz:

It’s not precise in specifying things. The only reason it works for many common programming workflows is because they are common. The minute you try to do new things, you need to be as verbose as the underlying language.

I guess applied to healthcare, you need to be verbose in instructions and prompts such that you almost did the work for it, otherwise you get a common list with a common answer, with how it is now. So it could be an amplifier of one’s own capabilities in a way right now.

The pubmed studies issue might be a search tool issue, I think it literally does output “search(pioglitazone pubmed)”, then uses some search results in producing the shown output to you.

2 Likes

Just today I used GROK, the free version anchored to a X account, and it did search in the Pubmed database, oultining a recent (2024) article, a metanalysis on protein requirements.
The same prompt fed to GPT5 resulted in an answer without a Pubmed citation, and more anchored to the classic guidelines. The answers were pretty different, more balanced and traditional GPT5, whereas Grok was more careful to the current web opinions but also examining the literature and with a more ‘rebellious’ nuance, to cite Musk himself.

To be fair, GPT would have required a different prompt, since it is told that it is very susceptible to the input, so specific rules of prompt engineering should be applied, as per the OpenA I’ cookbook’.

1 Like

You all need to start using the ChatGPT 5 (I have Pro) with the “thinking” mode. Mine gives me citations and generally does a great job. However, it does still have the tendency to try and please me by telling me what I want to hear, and always relating things to my specific situation. For example, it knows I lift weights etc, so that would influence an answer about protein intake. So in the prompt I have to tell it to just give objective answers.

If people are just using AI models like google and typing very short prompts, they’re going to have a miserable time.

2 Likes

Gemini flash 2.5 (free) is pretty good, and it has capabilities like ‘tools’ where you write your preferences and these act as an overall master prompt. But sometimes GPT5, when correctly prompted and in deep thinking mode, is formidable. Not without some annoyances like the ones you described.

Every AI has its nuances and probably it’s not a bad idea to consult 2 or 3 of’em for the same issue, if considered important enough.

1 Like

Stanford Medicine magazine reports on chronic disease prevention, diagnostics, care

Full issue: Stanford Medicine magazine reports on chronic disease prevention, diagnostics, care

  • Paging Dr. Algorithm: Stanford University’s medical school is revamping its curriculum to incorporate lessons on how AI works and how to use it. It’s also providing AI-based apps so students can practice interacting with patients and making diagnoses.
1 Like

The current discourse around AI progress and a supposedbubble” reminds me a lot of the early weeks of the Covid-19 pandemic. Long after the timing and scale of the coming global pandemic was obvious from extrapolating the exponential trends, politicians, journalists and most public commentators kept treating it as a remote possibility or a localized phenomenon.

Something similarly bizarre is happening with AI capabilities and further progress. People notice that while AI can now write programs, design websites, etc, it still often makes mistakes or goes in a wrong direction, and then they somehow jump to the conclusion that AI will never be able to do these tasks at human levels, or will only have a minor impact. When just a few years ago, having AI do these things was complete science fiction! Or they see two consecutive model releases and don’t notice much difference in their conversations, and they conclude that AI is plateauing and scaling is over.

There are disputes and arguments - you can do that endlessly. Fortunately, there’s a much simpler resolution: let nature take its course and in due time it will be revealed who was right, the optimists or the pessimists. Shouldn’t take long according to the optimists, so there’s at least that, we won’t have to wait much. I’m getting my popcorn ready.

3 Likes

I generally think the future is very bright, in terms of applying GPT-5+ type models to healthcare and life-extension, but every once in a while I read something by various AI models that make me angry. E.g. they misread sometimes, confusing mortality rates for treated versus untreated cases. This is why it’s good to go to sources like:

If untreated, a brain abscess is almost always deadly. With treatment, the death rate is about 10% to 30%. The earlier treatment is received, the better.

Some people may have long-term brain or nerve damage after a brain abscess or surgery.

1 Like

The latest serious mistakes I saw was when asking opinions about a PDF with my recent blood analysis. The AIs (I consulted GPT5 and Gemini) read one or two values very incorrectly. Then they read some values which wasn’t there. I’ll point out that the print was very clear.
A healthy dose of skepticism must always be applied. Cross-checking figures is necessary. The efficiency at reading attachments, especially details like figures, is not foreseeable a priori. Sometimes it’s excellent, sometimes much less so.

2 Likes

I don’t know. What confidence can you have in anything these AI platforms come up with, when they make such fundamental errors. These are just things you caught, what about all the stuff you didn’t, do you really think their analyses are worth a damn? I don’t.

1 Like

I understand your skepticism, but in serious matters, some cross-checking with sources is very advisable. Also, the answers sometimes include very interesting novel aspects and details that we didn’t know.
However, what I found to be the foremost principle is that you should already have some knowledge of the topics being treated. If you know nothing, then it will be very hard to judge the reliability of the answers. If you know the issue, then you get a real brainstorming.

Chatbots powered by large language models are being promoted as a way to fill gaps in health care, especially where doctors are scarce.

But our new research has found that while these AI chatbots like ERNIE Bot, ChatGPT, and DeepSeek show promise, they also pose significant risks—ranging from overtreatment to reinforcing inequality. The findings are published in the journal npj Digital Medicine.

Accuracy meets overuse and inequality

All three AI chatbots—ERNIE Bot, ChatGPT, and DeepSeek—were highly accurate at a correct diagnosis—outperforming human doctors.

But, AI chatbots were far more likely than doctors to suggest unnecessary tests and medications.

In fact, it recommended unnecessary tests in more than 90% of cases and prescribed inappropriate medications in more than half.

For example, when presented with a patient wheezing from asthma, the chatbot sometimes recommended antibiotics or ordered expensive CT scans—neither of which are supported by clinical guidelines.

And AI performance varied by patient background.

For example, older and wealthier patients were more likely to receive extra tests and prescriptions.

Our findings show that while AI chatbots could help expand health care access, especially in countries where many people lack reliable primary care, without oversight, they could also drive up costs, expose patients to harm and make inequality worse.

Health care systems need to design safeguards—like equity checks, clear audit trails and mandatory human oversight for high-stakes decisions—before these tools are widely adopted.

Our research is timely, given the global excitement—and concern—around AI.

While chatbots could help fill critical gaps in health care, especially in low and middle-income countries, we need to carefully balance innovation with safety and fairness.

2 Likes

AI might be our best hope to fix health care

Health care remains one of the most stubborn failures of American society. Costs keep climbing at unsustainable rates. More than 27 million people remain uninsured and more than 100 million lack a primary care provider. While some are fortunate to receive state-of-the-art care, as many as 200,000 patients patients die each year from preventable medical errors.

Smart people have been grasping for ways to fix these problems for generations. They’ve tinkered with payment models and tried desperately to expand the industry’s workforce. Nothing has come close to solving the industry’s deficiencies.

Now, however, the country has a new reason for hope: artificial intelligence. That’s the big idea in health informaticist Charlotte Blease’s new book, “Dr. Bot: Why Doctors Can Fail Us — and How AI Can Save Lives.”

Read the full opinion article: AI might be our best hope to fix health care (WaPo)

Relacionado:

https://www.amazon.co.uk/Dr-Bot-Doctors-Us_and-Could/dp/0300247141

Table of Contents: https://www.jstor.org/stable/jj.33193139

https://www.hifa.org/dgroups-rss/dr-bot-why-doctors-can-fail-us-and-how-ai-could-save-lives-2

The A.I. Will See You Now: Why Your Doctor’s Days Are Numbered

2 Likes

Why Doctors Say OpenEvidence Is A ‘Game Changer’ (Bloomberg Television)

4 Likes

Kurzgesagt goes into detail as to how AI falsifications are becoming facts, which is dangerous. In essence, an AI will make up 20% of the information. This then gets used by a journalist or YouTuber, and suddenly that made-up information becomes a legitimate source. This is then used and reinforced by other AIs, further codifying and legitimizing the falsifications.

3 Likes

Yes, this is definitely a problem. I’m an Associate Editor at a couple of (decent, but not S-tier) journals. We are inundated with papers which are clearly written by LLMs. Of course some will be accepted, either by our journals, or others, and go on to be the “truth” used to inform other models and other people.

There is also a big problem in science with dogma taking over. For example, people got very hyped over stem cell therapy and it’s gone absolutely nowhere. People are very hyped over nanomedicine, and it’s gone nowhere. Same now for exosomes, and the same claims repeated again and again, but backed up by almost no evidence. But those things get written into papers, which are then digested by AI models and regurgitated to people who then see a claim with supporting published evidence.

I’ve also directly seen this in my own work. My suggestion is that you really interrogate your AI model of choice on a topic where you are extremely knowledgable. Then you see the gaps. They also really want to please you, so they’ll tell you what you want to hear. They might even kiss your ass by complimenting your insightful question and “you’re absolutely right to be sceptical” etc

2 Likes