Benchmark accuracy vs real-world accuracy
Several published studies show AI models matching or outperforming dermatologists on curated datasets such as the ISIC archive. Those numbers describe model behavior on high-quality dermoscopic images of well-defined lesion classes, evaluated under controlled conditions.
Real-world use looks different. A home user takes one photo from one angle with whatever lighting they have. The lesion sits on hair-bearing skin or on a curved body surface. The dermatoscope head may not be pressed evenly. The model sees a single image, not a clinical context, not a patient history, and not the rest of the skin for ugly-duckling comparison.
Even the best benchmark-leading model loses several percentage points of sensitivity when moved from a curated dataset to in-the-wild home capture. That is not a flaw of the model; it is a flaw of single-image, no-context screening.
Where AI helps most
AI screening adds the most value when it does things human clinicians are bad at or have no time for.
Documentation: AI-assisted apps capture a structured timeline of every mole with consistent metadata, so a five-year evolution is easy to retrieve at the next dermatology visit. No clinician can scale this without software help. Triage: a phone-based AI screener can sort 30 lesions by risk score and surface the few that deserve a closer human read first. Education: AI-assisted feedback on image quality (focus, lighting, framing) trains home users to take better dermatoscope photos.
These uses are about documentation and prioritization, not diagnosis.
- Structured timeline of moles over years
- Prioritization of higher-risk lesions for clinical follow-up
- Image quality feedback during capture
- Reminder cadence for repeat photos
- Printable report a dermatologist can read in seconds
Where AI fails most
Failure modes cluster around four areas.
Non-melanocytic lesions: many AI scoring frameworks (TDS, 7-point) are designed for melanocytic lesions. When pointed at a basal cell carcinoma, a fibroma, or a seborrheic keratosis, they produce numbers that are either falsely reassuring or falsely alarming.
Pigmented skin: most published datasets over-represent fair skin. AI models often underperform on darker skin tones, particularly for acral melanoma — which is the type most likely to appear on darker skin in the first place.
Image quality: low brightness, motion blur, lens fingerprints, and uneven dermatoscope contact all degrade the input. Models do not always tell the user when the image is bad; they may produce a confident-looking score on noisy pixels.
Stochasticity: language-vision models are not deterministic. Running the same image twice can yield different diagnoses, especially when the lesion is dermoscopically ambiguous.
Why the same photo gives different answers
Modern AI screening systems often use vision-language models with non-zero sampling temperature. That means the model chooses among plausible tokens at each step, and two runs of the same input can land on different paths. For a clear melanoma or a clearly benign nevus, both runs usually agree. For an ambiguous lesion — for example, an early sebaceous hyperplasia that looks vaguely like an early BCC — the runs diverge.
A well-designed AI screening app handles this in two ways. It lowers the temperature for the first pass, so a routine clear-cut lesion gets a deterministic answer. And on the first-pass alarming cases, it runs an ensemble — three or more independent reads — and reports the consensus plus the agreement percentage. If agreement is high, the user has a confident screening signal. If agreement is low, the model is telling you the image is ambiguous and that a human read carries the weight.
DermaTrack uses this two-stage strategy: a deterministic first read, and an ensemble retry whenever the first read returns HIGH or VERY HIGH risk.
Triage support, not diagnosis
An AI screening tool is best framed as a triage layer that sits in front of the dermatologist visit, not as a substitute for it. It tells a user: 'this lesion deserves a closer look soon' or 'this lesion looks stable, repeat in three months.' It does not say 'this is melanoma' or 'this is not melanoma.'
Treating the score as a diagnosis is the most common user error. A LOW score does not rule out cancer; new symptoms (bleeding, pain, non-healing), an ugly-duckling lesion, or rapid change should always overrule a reassuring score. A HIGH score does not confirm cancer; many HIGH scores are benign mimics, and only a clinician can decide whether biopsy is warranted.
How to use both together
The most useful workflow combines repeat AI-assisted home documentation with periodic in-person dermatology checks.
At home: photograph each mole at consistent intervals, let the app sort lesions by risk score and recent change, and act on flagged lesions inside one to two weeks. At the clinic: bring the printed timeline and the original photos for any lesion the app flagged HIGH or that visibly changed. Ask the dermatologist to perform live dermoscopy on those lesions and to do a full body skin exam at least annually if you have risk factors (family history of melanoma, fair skin with sunburn history, many atypical nevi, immunosuppression, prior skin cancer).
Used this way, the AI app does the documentation and prioritization work that the clinic does not have time for, and the clinic does the clinical reasoning that the model does not have context for.
Frequently asked questions
Is an AI mole check as accurate as a dermatologist?
On curated benchmarks, top models can approach dermatologist sensitivity. In real-world home use with single phone photos, both miss things — but in different ways. The best practice is to use them together.
Can AI replace a yearly skin check?
No. AI screening apps document and triage, but a full skin exam, dermatologist judgment, and biopsy capability are not replaceable.
Why did the app give a different answer on a second scan of the same mole?
AI models can be stochastic, especially on ambiguous lesions. A good screening tool runs multiple reads on alarming cases and reports the agreement so you can see when the model is uncertain.
How often should I check moles with an app?
Repeat baseline photos every 1-3 months for tracked lesions. Increase frequency for any lesion that has changed, bled, or hurt. Always book a clinician for sudden or symptomatic changes.