How to tell if a peptide is working: what to track, and for how long

Why "do you feel better?" is the worst available signal

Self-reported wellbeing is the most universally available outcome measure in human experimentation and the most unreliable. Three well-documented effects make it nearly useless on a short timescale for a single individual.

The first is recall bias. Asked today how you felt three weeks ago, you reconstruct a memory rather than retrieving one. The reconstruction is influenced by how you feel right now, by what you have read or heard about the compound in the meantime, and by the natural human bias toward narrative coherence. The intervention either "worked" or "didn't work" — the messy middle ground that almost all real biological data actually occupies gets smoothed out.

The second is regression to the mean. People start interventions when something is wrong: they are sleeping badly, in pain, struggling with weight, low on energy. By definition, they are at a local trough. Statistically, they were going to feel somewhat better in the next few weeks even without the intervention. Attributing that recovery to the new compound is a classic confounder.

The third is the placebo effect, which is real but more limited than is popularly believed. Hróbjartsson and Gøtzsche's landmark 2001 NEJM analysis pooled 114 trials comparing placebo with no treatment and found that placebo produced large effects on continuous, subjective outcomes (especially pain) but generally negligible effects on objective measures (lab values, weight, blood pressure). The practical implication for peptide self-experimentation: the more subjective the outcome you are tracking, the less you can trust the difference you observe.

The countermeasure is not to abandon subjective outcomes — they are often what you care about — but to pair every subjective measure with at least one objective one, and to commit to the measurement plan before you start, not after.

The pharmacokinetic floor: when can you possibly see anything?

Before designing any tracking plan, set a minimum window based on pharmacokinetics. Most peptides need to reach steady-state plasma concentration before their full effect is even theoretically present, and steady state requires roughly five half-lives of consistent dosing.

A few worked examples:

Semaglutide — terminal half-life ~7 days. Steady state at ~35 days, or 5 weeks. Any judgment about whether semaglutide is "working" before week 5 of a stable dose is premature; you are looking at sub-therapeutic exposure. The STEP 1 and SUSTAIN trials both used dose-escalation schedules spanning 16–20 weeks before measuring primary endpoints, and the weight loss curves continue to descend through week 68. Five weeks is the floor for steady state, not the time to conclusion.
Tirzepatide — half-life ~5 days. Steady state at ~25 days. SURMOUNT-1 reported its primary endpoint at week 72.
CJC-1295 with DAC — half-life ~6–8 days. Steady state at ~5 weeks of weekly dosing.
Ipamorelin / CJC-1295 (no DAC) — half-lives measured in minutes to hours. Plasma concentrations after each dose are essentially independent. No "steady state" in the same sense; the relevant timescale is whatever the downstream effect (GH pulse, IGF-1 trajectory) takes to manifest.
BPC-157, TB-500 — short half-lives but the proposed mechanism is local tissue effect over weeks. The relevant window is the healing timeline of the tissue you are targeting, not plasma kinetics.

A reasonable working rule: 12 weeks is a defensible minimum window for any judgment about systemic peptides. This matches the standard duration used in most published Phase II trials and accounts for both kinetic ramp-up and biological response time. Shorter windows are fine for interim checks; just do not draw conclusions from them.

Biomarkers that actually move with what compounds

Different mechanisms produce different signals. Picking the wrong biomarker is the most common reason people conclude a peptide "isn't doing anything" when in fact it is doing something they are not measuring.

GLP-1 and GLP-1/GIP agonists (semaglutide, tirzepatide, retatrutide, liraglutide)

Direct indicators of activity:

Body weight. The primary endpoint in every weight-management trial. STEP 1 reported ~14.9% mean reduction at 68 weeks on semaglutide 2.4 mg; SURMOUNT-1 reported up to 22.5% on tirzepatide 15 mg. These are population means with wide individual ranges — individual responses range from minimal to >25%.
Waist circumference. Captures visceral fat changes that weight alone can miss.
Fasting glucose and HbA1c. Even in non-diabetics, fasting glucose typically drops by 5–15 mg/dL and HbA1c by 0.2–0.4% over the first 3–6 months.
Resting heart rate. GLP-1 agonists modestly increase resting heart rate (typically +2–4 bpm in trials). If you do not see this, it is mild evidence of low exposure.
Blood pressure. Typically drops 3–6 mmHg systolic over 3–6 months as weight falls.

Secondary indicators worth tracking:

Subjective hunger / food noise (low signal but high salience)
Meal size and grazing frequency
Fatigue and nausea (dose-limiting side effects; presence confirms exposure even when efficacy is unclear)

GH secretagogues (ipamorelin, CJC-1295, sermorelin, tesamorelin, MK-677)

The single most direct biomarker is IGF-1 (insulin-like growth factor 1). The downstream effect of any GH pulse is hepatic IGF-1 production, which integrates over days and has a much longer half-life than GH itself, making it the cleanest indicator of average GH exposure.

Baseline IGF-1, then a follow-up at 8–12 weeks on a stable dose. The expected response is a meaningful upward shift within the age-appropriate reference range (not above it — supraphysiologic IGF-1 is not the goal and carries its own risks).
Body composition via DEXA scan if accessible, or a high-quality bioimpedance scale, at 12-week intervals. Lean mass changes from GH-axis interventions are slower than fat changes and may not appear before 4–6 months.
Sleep quality, especially deep (slow-wave) sleep duration if measured by Apple Watch or Oura.
Subjective recovery from training and joint comfort.

What does not reliably track GH-axis effects: random GH levels (too pulsatile to interpret from a single blood draw), grip strength on short timescales, and unaided weight (lean gain often offsets fat loss).

Healing peptides (BPC-157, TB-500, KPV, GHK-Cu, PDA)

These compounds are studied primarily for localized effects: tendon healing, wound repair, gut barrier integrity, joint function. Systemic biomarkers do not capture this well.

Pain and function scores for the specific tissue. A 0–10 visual analog scale at consistent timepoints is more useful than memory.
Range of motion measurements for joints. A goniometer is cheap; a phone protractor app is free.
Time to specific milestones ("could not jog a mile, now can").
Photographs for visible wounds or skin changes (good lighting, same distance, same angle, ideally same time of day).

Systemic biomarkers for healing peptides — hsCRP, ESR — are insensitive to localized healing and rarely move in a useful way.

Melanocortin agonists (PT-141 / bremelanotide, melanotan)

FSFI (Female Sexual Function Index) for women — a 19-item validated instrument with established population norms (Rosen et al. 2000). Free and quick to administer.
IIEF (International Index of Erectile Function) for men, similarly validated.
Subjective ratings of episode-by-episode response on the days of dosing.

Apple Health metrics that carry real signal

Apple Health is a useful aggregator because most of the relevant inputs (weight from a smart scale, HR and HRV from the Watch, sleep from Watch or third-party trackers) end up in the same place. Some of the data carries strong signal; some is mostly noise.

High-signal metrics, in rough order:

Body weight (smart scale, daily). Noisy day-to-day but the moving average is unambiguous.
Resting heart rate (Apple Watch, automatic). Trends over weeks reflect cardiovascular adaptation and autonomic state. GLP-1 agonists raise it; aerobic training lowers it.
Sleep stages and duration (Watch or third-party). Total sleep is reliable; stage breakdowns are noisier but trends are useful.
Heart rate variability (Watch, automatic). High inter-day variability but a clear weekly average; sensitive to stress, illness, and recovery state.
VO2max estimate (Apple Watch). Updated periodically from outdoor workouts. Slow-moving but a reasonable proxy for cardiovascular fitness.

Lower-signal metrics, to use with caution:

Manually entered mood, stress, energy. High recall bias, low precision.
Step count alone (without context). Reflects behavior more than physiology.
Active calories estimated by the Watch. Useful relatively, not absolutely.

The single most useful thing you can do with Apple Health data is overlay the biomarker trace with the dose log. A weight curve with weekly injection markers tells you what you need to know in seconds. A weight curve in isolation is just a curve.

Designing a measurement window

A measurement plan has four phases.

Baseline (2–4 weeks). Before the first dose, collect every metric you intend to follow. One data point is not a baseline; you need a range. For weight, daily weigh-ins for two weeks. For HR / HRV, the Watch is already doing this — just look back. For IGF-1 and other lab values, a single pre-intervention draw is the floor.
Loading / titration (4–8 weeks). Dose-escalation phase if applicable. Track for safety and side effects, but do not draw efficacy conclusions yet.
Steady-state intervention (8–12 weeks). This is the window in which the biomarkers are interpretable. Continue tracking on the same schedule as baseline.
Optional washout (4–8 weeks). If you stop the intervention, the trajectory after stopping is informative. Weight regain after GLP-1 discontinuation is well-documented in the STEP-4 extension; the size of the regain estimates how much of the prior effect was drug-dependent vs behavior-dependent.

A common research minimum is 12 weeks of intervention plus a few weeks on either side, for a total measurement window of around 16–20 weeks. This is the floor, not the ceiling.

The lab tests worth running

For a systemic intervention, a small panel of pre/post lab work pays for itself in interpretability. Typical costs at direct-to-consumer labs (US, 2025):

Comprehensive metabolic panel (CMP) — kidney, liver, electrolytes. ~$15–30.
Lipid panel — total, LDL, HDL, triglycerides. ~$15–30.
HbA1c — three-month average glucose. ~$25.
Fasting insulin — pair with glucose to compute HOMA-IR (insulin resistance estimate). ~$25.
hsCRP — systemic inflammation. ~$25.
IGF-1 — GH axis activity. ~$50–80.
Thyroid panel (TSH, free T4, optionally free T3) — covers off-target effects. ~$40.

Drawn at baseline and again at 12 weeks, this is a defensible minimum panel for any systemic intervention. The whole set typically runs under $200 direct-to-consumer and answers most of the "is this actually doing something" question more reliably than any subjective scale.

Daily vs weekly vs interval

A logging cadence that is sustainable beats one that is comprehensive but abandoned.

Every day: weight (one measurement, same time, same conditions), dose-day check-in (injected? site? any acute side effects?), sleep duration. All of these are passive or near-passive.
Every week: waist circumference, subjective ratings on a 0–10 scale for whatever you care about (energy, hunger, joint pain, sleep quality), weekly average resting HR and HRV from Apple Health.
Every 4 weeks: body composition (smart scale or DEXA), progress photos if relevant, side-effect inventory, dose review.
Every 12 weeks: lab panel, formal review of the trajectory, decision about continuation, dose change, or stopping.

The cadence is structured so that the daily burden is near zero — most of it is automatic from a Watch and a scale — and the heavier interventions (lab draws, decisions) happen rarely enough that they remain feasible.

Statistical reality and the N-of-1 problem

A single person tracking their own data is running an N=1 experiment with no control group, no blinding, and no statistical power. Useful information can be extracted, but it requires acknowledging what the data cannot do.

What N=1 self-tracking can show: direction of change, approximate magnitude, time course, and side-effect profile.

What it cannot show: causation with confidence. The placebo effect, regression to the mean, seasonal variation, and concurrent life changes (diet, sleep, training) are all confounders that no individual data set can fully untangle. Nor can it show whether the effect would have happened anyway, or whether the magnitude is unusual relative to others on the same intervention.

The honest framing: a 12-week trial of yourself produces evidence about how a compound affects you, not proof. The more pre-committed your measurement plan and the more objective your endpoints, the better that evidence holds up.

How Vial puts this together

Vial overlays everything: dose history on one axis, biomarker traces drawn from Apple Health on the other. Each injection appears as a marker on the timeline, so the relationship between dose and outcome is visible at a glance. The app pulls weight, resting heart rate, HRV, sleep, and VO2max automatically; manually entered ratings and lab values slot into the same view. The point is not to produce a verdict — the data does that — but to remove the bookkeeping that usually keeps people from looking at their own data honestly.