The Data Vacuum: What Doctors and Patients Don't Know Before, During, and After Starting HRT

A research paper from the Romi Science Team

The premise

Menopause is one of the most poorly measured major health transitions in modern medicine. A woman entering perimenopause in 2026 has fewer objective markers available to her clinician than she had during pregnancy, during fertility treatment, or even during a routine annual physical. The decisions that shape the next decade of her life, including whether to start hormone therapy, which formulation, what dose, whether it is working, and when to adjust, are made on a foundation of subjective recall, validated questionnaires filled out at clinical visits, and lab values that are known to be unreliable in this exact population.

This report walks through where the data infrastructure breaks down. It covers the diagnostic phase, the symptom characterization phase, the HRT decision, and the HRT monitoring loop. At each phase, the question is the same. What does the clinician have to work with, what is missing, and what kind of objective data could meaningfully change the decision?

Phase 1: Pre-diagnosis. The perimenopause black box.

The defining feature of perimenopause is hormonal variability, not steady decline [1]. Ovarian function during the transition oscillates rather than tapers. Some cycles are ovulatory and others are not. Inhibin B, FSH, and estradiol all fluctuate substantially, sometimes within a single cycle [1, 2]. This is not a clinician failure. It is a biological feature of the transition. But it has a direct consequence for diagnosis. The standard lab tests that women and clinicians most commonly reach for are not reliable.

FSH testing is broadly not recommended for diagnosing perimenopause. Multiple international guidelines converge on this. NICE guideline NG23 in the UK explicitly states that blood tests are rarely required to diagnose perimenopause or menopause in women aged over 45 and should not be taken. FSH levels fluctuate significantly and bear no correlation with severity or duration of symptoms or requirement for treatment [3]. The Canadian Menopause Society echoes this. Hormone levels including FSH and estradiol are not reliable during perimenopause because of fluctuating hormones [4]. The International HTA Database review found that FSH and LH assays have no greater diagnostic power than menopausal symptoms themselves [5].

The exceptions are narrow. FSH testing is appropriate for women under 40 with suspected premature ovarian insufficiency (POI), or for women aged 40 to 45 with symptoms suggesting early menopause [6, 4]. For POI, FSH must be elevated on at least two separate occasions about four to six weeks apart [4]. Outside these specific clinical scenarios, the test produces noise.

AMH does not solve this either. Anti Müllerian hormone reflects ovarian reserve and declines as women approach menopause, but two AMH measurements only predict age at menopause within about four years, only marginally better than menstrual cyclicity alone [2]. Clinical society guidance, including from ASRM and NICE, advises against using AMH to diagnose perimenopause or to time menopause precisely in routine care [1].

The result is diagnosis by exclusion and self report. For women aged 45 and older, the formal diagnostic criterion for menopause is twelve consecutive months without menstruation [4]. Perimenopause is diagnosed clinically based on symptom history and menstrual irregularity. There is no biomarker that confirms it.

This produces meaningful delays in care. A UK survey of over 5,000 women conducted by Newson Health found that 79% of women had visited a GP with their symptoms, 7% attended more than ten times before receiving adequate help, and 44% of women who eventually received treatment had waited at least one year, with 12% waiting more than five years [7]. Only 37% of women in that survey were offered HRT at all, and 23% were given antidepressants instead, contradicting NICE guidance that antidepressants should not be the first choice for low mood associated with menopause [7].

The structural problem is straightforward. When the diagnostic test is unreliable and the diagnosis depends on a clinician's interpretation of self reported symptoms across a seven to ten year transition, the system biases toward delay, dismissal, and misattribution.

Phase 2: Symptom characterization. The diary problem.

Once perimenopause is suspected or diagnosed, the next clinical question is how severe the symptoms are and how they are impacting quality of life. This determines whether treatment is indicated and what to prescribe.

The current standard of care for measuring this is patient reported outcome scales. Three are commonly used.

The Menopause Rating Scale (MRS) scores 11 menopausal symptoms (hot flushes, heart discomfort, sleep problems, depressive mood, irritability, anxiety, physical and mental exhaustion, sexual problems, bladder problems, vaginal dryness, and joint and muscular discomfort), each on a 0 to 4 scale [8]. It has been formally validated as an outcome measure for hormone therapy, with one large post marketing study of over 9,000 women showing a 36% average improvement in total score after six months of HRT [9, 10].

The Greene Climacteric Scale uses 21 items across five domains: anxiety, depression, somatic, vasomotor, and sexual function, each scored on a four point Likert scale [8].

The Menopause Specific Quality of Life questionnaire (MENQOL) uses 29 questions across four domains: vasomotor, psychosocial, physical, and sexual [8].

These scales are well validated, internationally translated, and clinically useful for what they are. But they share a common set of limitations as a data source for clinical decision making.

They are retrospective. A woman filling out an MRS in the clinic is asked to summarize the severity of her symptoms over the previous weeks. Recall bias is well documented in this format. Hot flashes in particular are systematically under reported relative to objective skin conductance monitoring. Nocturnal events, which the woman may sleep through or only partially register, are missed more frequently than daytime ones [11].

They are infrequent. Scales are typically administered at clinic visits, which for menopause management often occur every three to six months. Between visits, the clinician has no data. This is roughly equivalent to managing diabetes by checking HbA1c twice a year with no continuous glucose monitor in between.

They collapse signal into a single number. A total MRS score of 15 tells you the patient is symptomatic. It does not tell you whether her hot flashes happen mostly at 3 AM and are fragmenting her sleep, whether her HRV trended down over the last six weeks, whether her symptoms cluster after specific triggers, or whether one symptom is improving while another is worsening.

They cannot distinguish cause from comorbidity. A high score on the psychological subscale could reflect a primary mood disorder, sleep deprivation secondary to vasomotor symptoms, or autonomic dysregulation driven by hormonal change. These are three conditions with three different treatments. The scale does not adjudicate.

This is the data the clinician uses to decide whether to prescribe HRT, what severity tier the patient falls into, and later, whether the treatment is working. It is the best validated instrument we currently have. It is also a profoundly low resolution input for a decision that shapes years of a patient's health trajectory.

Phase 3: The HRT decision.

Deciding whether to prescribe hormone therapy is, in current practice, a clinician judgment built on five inputs.

  1. Symptom severity, captured via the scales above or unstructured history.

  2. Personal and family medical history, especially breast cancer, cardiovascular disease, venous thromboembolism, and stroke.

  3. Cardiovascular risk profile, often quantified via the 10 year ASCVD risk score.

  4. Time since menopause onset. The "timing hypothesis" frames HRT as cardioprotective when initiated within 10 years of menopause or before age 60, and potentially harmful when initiated later [12, 13].

  5. Patient preference and goals.

The Cleveland Clinic's risk stratification framework is representative of current best practice. HRT is considered low risk in women with recent menopause, normal weight and blood pressure, an active lifestyle, a 10 year ASCVD risk under 5%, and low breast cancer risk. It is intermediate risk in women with diabetes, smoking, hypertension, obesity, sedentary lifestyle, autoimmune disease, hyperlipidemia, metabolic syndrome, or 10 year ASCVD risk of 5 to 10%. It is high risk in women with established cardiovascular disease, prior thromboembolism, stroke, breast cancer, or 10 year ASCVD risk of 10% or higher [13].

This framework is sound. The problem is the granularity of the inputs.

Cardiovascular risk is calculated from a snapshot. The ASCVD score uses age, sex, race, total and HDL cholesterol, systolic blood pressure, hypertension treatment status, smoking, and diabetes [13]. Most of these are measured once per visit. None of them capture the autonomic dysregulation, sleep disruption, or vasomotor burden that are themselves becoming recognized as cardiovascular risk markers in their own right [12]. A 2025 review in Autonomic Neuroscience on menopause and cardiovascular regulation noted that the menopausal loss of estradiol contributes to changes in autonomic and vascular physiology and is implicated in vasomotor and sleep symptoms. These signals are not captured in the standard risk calculator.

Symptom severity at the time of decision is poorly characterized. As discussed above, scale based assessment provides a single retrospective summary number. The clinician does not know whether the patient is having 4 hot flashes a day or 14, whether they cluster at night, whether they are getting worse week over week, or how they correlate with sleep, stress, and behavior. Yet symptom frequency and severity are the primary clinical justification for systemic HRT.

Individual baseline physiology is unknown. Hormone therapy is initiated at a starting dose, typically 0.0375 mg per day transdermal estradiol for moderate to severe vasomotor symptoms [14], without a personalized model of where this woman's body sits within the population distribution. Estrogen pharmacokinetics vary substantially between individuals, and tissue level estrogen concentrations are not predictable from the dose [14]. There is no objective baseline against which to measure response.

The decision to start HRT, in other words, is made with a consequential risk model fed by a small number of low resolution inputs. The science supports HRT as effective and safe for the right patients. The data infrastructure for identifying who those patients are, and for personalizing the starting point, is thin.

Phase 4: HRT monitoring. The wait and see model.

Once HRT is started, the standard follow up cadence is 3 months, then 6 months, then annually [13]. At each visit, the clinician asks how the patient is feeling, may readminister a symptom scale, checks blood pressure, and screens for adverse effects. There is general guidance to monitor cardiovascular risk markers, conduct mammography on schedule, and reassess the risk and benefit balance over time [15].

What is conspicuously missing is objective, between visits data on whether the treatment is working.

Hormone level monitoring is generally not used. Unlike gender affirming hormone therapy, where the Endocrine Society recommends estradiol monitoring every 3 months [16], menopausal HRT in cisgender women is typically titrated by symptom response rather than serum levels. There are good reasons for this. Pharmacokinetic variability means serum estradiol does not predict tissue effect [14]. But it leaves a gap. The patient and clinician have no objective measurement to anchor on.

The feedback loop is "wait three to six months and see how you feel." This is the dominant clinical heuristic [17]. A patient who starts HRT and continues having significant vasomotor symptoms typically waits weeks or months before her next clinical encounter, at which point she reports back subjectively whether things have improved. If the dose is too low, this delay represents months of unnecessary symptom burden. If the dose is too high or the formulation is wrong, side effects accumulate without timely adjustment.

Adverse event detection is reactive. Cardiovascular adverse events from HRT, including venous thromboembolism, stroke, and myocardial infarction, are individually rare in the appropriate age population but are the source of the regulatory caution around the therapy [12, 18]. They are detected when the patient presents with symptoms or when annual screening picks them up. Continuous physiological data on heart rate trends, blood pressure variability, autonomic tone, and sleep quality could in principle provide earlier signal of cardiovascular drift, but is not currently part of the monitoring infrastructure.

Compliance and discontinuation are poorly tracked. Claims data analyses show that a significant fraction of women discontinue HRT within the first year, with disparities by race, ethnicity, education, and income [19]. The reasons are mixed. Side effects, cost, fear of long term risks, perceived lack of benefit. The system does not capture them in structured form.

The result is that HRT, one of the most effective interventions available for menopausal symptoms [9, 10], is deployed and monitored with a feedback loop that operates on a timescale of months when the underlying physiology operates on a timescale of seconds.

What objective data would change

The data infrastructure that perimenopause care needs is not exotic. It is the same kind of continuous physiological monitoring that has transformed diabetes care, sleep medicine, and cardiac arrhythmia detection. Applied to the menopausal transition, the high value signals fall into four categories.

Vasomotor symptom frequency, severity, and timing. A continuous, objective record of when hot flashes and night sweats occur, captured via the multimodal sensor stack of EDA, skin temperature, PPG, and accelerometry, validated against expert scored events to the 90% plus accuracy levels demonstrated in published hot flash detection literature, replaces the patient diary with a measurement clinicians can trust. This matters at every phase. It confirms diagnosis when symptoms are ambiguous, it stratifies severity at the HRT decision, and it provides a quantitative endpoint to evaluate treatment response.

Sleep architecture and continuity. Multimodal wearables now produce sleep stage data validated against polysomnography to clinically useful accuracy [20, 21]. For perimenopause, the most relevant signal is not just total sleep time but fragmentation. How many awakenings, when they occur, and whether they cluster around objectively detected vasomotor events. Sleep disruption is one of the most common reasons women seek menopause care, and one of the most common reasons HRT is judged successful or not.

Autonomic state via HRV and resting heart rate trends. Vagally mediated HRV declines across the menopausal transition and is implicated in both symptom severity and downstream cardiovascular risk [12]. Resting heart rate rises. Continuous monitoring of these trends provides a longitudinal physiological baseline that does not exist in current care. For HRT specifically, the question is whether autonomic balance shifts back toward parasympathetic dominance after treatment is started, a measurable signal that a single annual visit cannot capture.

Individual symptom trigger and symptom treatment correlation. With timestamped physiological events plus contextual data (sleep, activity, stress, food, alcohol, medication adherence), the system can learn which factors precipitate symptoms for a specific user and how those patterns shift with treatment. This is the personalization layer that current scale based care cannot deliver, and it is what turns generic HRT prescription into individualized therapy management.

This data does not replace clinical judgment. It feeds it. A clinician who sees that her patient's hot flash frequency dropped from 11 per day to 3 per day in the six weeks after starting HRT, that her HRV recovered toward premenopausal baseline, and that her sleep fragmentation halved, is making a different and better decision than one who hears "I think it's a little better" at the 3 month follow up.

Where Romi fits

Every layer of the menopause care pathway, from diagnosis to symptom characterization to treatment decision to treatment monitoring, is bottlenecked by the same problem. The absence of objective, continuous, individual physiological data.

The science on what to measure is settled. Hot flash detection from EDA and a multimodal sensor stack is published, validated, and reproducible [11, 22, 23]. Sleep staging from PPG and accelerometry is becoming clinically credible [20, 21]. HRV based autonomic monitoring is a mature signal. The validated symptom scales, while limited as standalone tools, become much more useful when paired with continuous physiological context.

What does not yet exist is a product that combines this measurement infrastructure with a clinician facing data layer designed for the actual workflow of menopause care, and a patient facing companion designed for the perimenopausal population specifically, not retrofitted from a fertility tracker.

That is what Romi is building. The thesis is straightforward. The gap in menopause care is not primarily a knowledge gap. The science of HRT is well established. The bottleneck is a measurement gap. Closing that gap is a data infrastructure problem, not a pharmaceutical one. And the right time to build that infrastructure is now, while the cultural and clinical conversation around menopause is finally moving from neglect to legitimate medical attention.

References

[1] Bonza Health. (2025). Why FSH Is Not a Reliable Indicator of Perimenopause. (Clinical review citing STRAW+10 and 2022 Menopause Society Position Statement.)

[2] Santoro N, Roeca C, Peters BA, Neal Perry G. (2018). Management of the Perimenopause. Clinical Obstetrics and Gynecology.

[3] National Institute for Health and Care Excellence (NICE). Guideline NG23: Menopause, Diagnosis and Management. (Updated 2024.)

[4] Canadian Menopause Society. Diagnosis and Management Clinical Guidance.

[5] International HTA Database. Value of measuring FSH and LH in menopause diagnosis.

[6] BodySpec Clinical Reference, citing NICE NG23 (2024) and ASRM committee opinion on AMH.

[7] Newson L, et al. (2022). Survey of women's experiences of perimenopause and menopause diagnosis and treatment. Balance Menopause survey of 5,000 plus women.

[8] National Academies Press. Measurement Scales for Menopause: Greene Climacteric Scale, MRS, MENQOL. In: Nonpharmacologic Treatments for Menopause Associated Vasomotor Symptoms.

[9] Heinemann LAJ, et al. (2004). The Menopause Rating Scale (MRS) as outcome measure for hormone treatment? A validation study. Health and Quality of Life Outcomes, 2:67.

[10] Heinemann LAJ, et al. (2004). The Menopause Rating Scale (MRS) scale: A methodological review. Health and Quality of Life Outcomes, 2:45.

[11] Carpenter JS, Monahan PO, Azzouz F. (2004). Accuracy of Subjective Hot Flush Reports Compared With Continuous Sternal Skin Conductance Monitoring. Obstetrics and Gynecology, 104, 1322 to 1326.

[12] Tobaldini E, et al. (2025). Menopause and its effects on autonomic regulation of blood pressure: Insights and perspectives. Autonomic Neuroscience.

[13] Cleveland Clinic. Menopausal Hormone Therapy and Heart Risk: Updated Guidance. (Risk stratification framework.)

[14] StatPearls. Hormone Replacement Therapy. NCBI Bookshelf, NBK493191. (Updated 2024.)

[15] Vitality Lounge Clinical Reference. HRT Pre Treatment and Monitoring Workup. (Aligned with 2020 Menopausal Hormone Therapy Guidelines.)

[16] UCSF Gender Affirming Health Program. Overview of Feminizing Hormone Therapy. (Citing Endocrine Society guidelines for hormone level monitoring cadence.)

[17] Oova Clinical Reference. (2025). HRT dose evaluation and adjustment timing.

[18] Mehedintu C, et al. (2025). The impact of hormone replacement therapy on cardiovascular health in postmenopausal women: a narrative review. PMC.

[19] Williams RE, et al. (2024). Health Disparities in Vasomotor Symptom Prevalence and Treatment Discontinuation. Journal of Women's Health.

[20] Birrer V, et al. (2024). Evaluating reliability in wearable devices for sleep staging. npj Digital Medicine.

[21] de Zambotti M, et al. (2020). Detecting sleep using heart rate and motion data from multisensor consumer grade wearables. SLEEP.

[22] de Zambotti M, et al. (2014). Automatic Detection of Hot Flash Occurrence and Timing from Skin Conductance Activity. IEEE Transactions on Biomedical Engineering.

[23] Sancho Casas C, et al. (2021). A novel Hot Flash classification algorithm via multi sensor features integration. IEEE EMBC.


Next
Next

Introducing Romi