Last Updated: March 1, 2026

Data Protection Impact Assessment (DPIA)

1. Introduction

This Data Protection Impact Assessment (DPIA) is conducted in accordance with GDPR Article 35 to evaluate the data protection risks arising from SoDNAscan's processing of genetic, health, and biometric data using artificial intelligence.

This assessment is maintained by the data controller and is available upon request to supervisory authorities and, in summary form, to data subjects upon request at info@sodnascan.com.

This document should be read together with our:

2. Data Controller

Controller: Samuel Virag
Contact Email: info@sodnascan.com
Data Protection Contact: privacy@sodnascan.com

No Data Protection Officer (DPO) has been formally appointed. Under GDPR Article 37, a DPO is required for public authorities or organizations whose core activities require large-scale, regular, and systematic monitoring or large-scale processing of special category data. SoDNAscan will appoint a DPO if processing scale reaches a threshold that triggers this obligation.

3. DPIA Triggers

Three independent criteria under GDPR Article 35 and the European Data Protection Board (EDPB) Guidelines on Data Protection Impact Assessment (WP 248 rev.01) trigger the requirement for this DPIA. Any two are sufficient; SoDNAscan meets all three:

#	Trigger	Applicable GDPR Provision	How SoDNAscan Meets It
1	Processing special category data at scale	Art. 35(3)(b), Art. 9	Genetic data (DNA genotypes) and health data (blood biomarkers, wearable metrics, health history) are special category data under Art. 9. Processing occurs systematically for all users who upload data.
2	Automated decision-making or profiling with significant effect	Art. 35(3)(a), Art. 22	AI-powered analysis generates personalized health assessments, risk profiles, supplement protocols, and monitoring plans based on genetic and health data. While outputs are informational (not clinical decisions), they may influence users' health behaviors.
3	Use of new technologies	EDPB Guidelines, Criterion 8	Large language models (LLMs) processing genetic data for individualized health analysis represents a novel application of AI technology to special category data.

4. Description of Processing

4.1 Purpose

SoDNAscan generates personalized health books by analyzing users' genetic data, blood work results, and wearable health metrics using AI. The service provides wellness insights and educational health information — not medical diagnoses or clinical recommendations.

4.2 Data Subjects

Adult individuals (18+) who voluntarily create an account, upload their genetic data, and consent to AI-powered analysis.

4.3 Categories of Personal Data Processed

Data Category	GDPR Classification	Source
Account data (email, name, password hash)	Ordinary personal data	User-provided at registration
Demographic data (age, sex, height, weight, ethnicity)	Ordinary personal data	User-provided in profile
Genetic data (SNP genotypes — rsid, chromosome, position, alleles)	Special category data — Art. 9 (genetic data)	Uploaded DNA file
Blood work results (biomarker names, values, units, reference ranges, status flags)	Special category data — Art. 9 (health data)	Uploaded PDF or pasted text
Wearable health metrics (heart rate, HRV, SpO2, sleep, activity, body composition)	Special category data — Art. 9 (health data)	Uploaded Apple Health, Oura, Fitbit, or Whoop export
Self-reported health information (health history, family history, goals, supplements, lifestyle)	Special category data — Art. 9 (health data)	User-provided free-text fields
AI-generated health analysis (reports, chapters, fact sheets)	Derived special category data	Generated by AI processing
Payment data (Stripe session ID, payment intent, amount, currency)	Ordinary personal data	Stripe checkout
Consent records (consent type, granted/withdrawn, timestamp, policy version, IP address)	Ordinary personal data	System-generated

4.4 Processing Activities

The processing pipeline consists of four sequential stages:

Stage 1 — Data Ingestion

User uploads a genetic file (max 50 MB)
File is parsed to extract individual SNP genotypes (rsid, chromosome, position, alleles)
Parsed SNPs are matched against a curated reference database of health-relevant genetic variants
Only matched variants (typically several hundred out of ~700,000 total SNPs) proceed to analysis
Unmatched SNPs are not stored
Raw genetic file is stored in encrypted cloud storage for potential re-processing

Stage 2 — Supplementary Data (Optional)

Blood work: User uploads a PDF lab report or pastes text. The AI extracts structured biomarker data. User reviews and explicitly confirms extracted values before they enter analysis.
Wearable data: User uploads a health export file. Data is parsed locally (no AI involved). Per-metric aggregates (averages, min/max) are computed. The raw wearable file is deleted immediately after parsing (data minimisation by design). Only aggregated metrics are retained.

Stage 3 — AI Analysis and Book Generation

Matched genetic variants, confirmed blood biomarkers, aggregated wearable metrics, and sanitized profile fields are assembled into structured prompts
Data is transmitted via encrypted HTTPS to Anthropic's Claude
The AI generates 10 analytical reports covering different biological systems
Reports are validated for internal consistency
Up to 20 book chapters are generated from the reports
Chapters are rendered into a PDF health book

Stage 4 — Delivery and Storage

The PDF is stored in encrypted cloud storage
User can view the book online, download the PDF, and browse individual reports
All generated content is retained until the user deletes their account

4.5 Recipients and Processors

Recipient	Role	Data Received	Location	DPA/SCCs
Anthropic, PBC	Data processor	Matched genetic variants, confirmed blood biomarkers, aggregated wearable metrics, sanitized profile fields	United States (AWS/GCP)	DPA with SCCs in commercial API terms
Supabase, Inc.	Data processor	All stored data (database, files, auth)	EU West — Frankfurt	DPA with SCCs (signed via PandaDoc)
Stripe, Inc.	Data processor	Email, user ID, payment amount, currency	Global (including US)	DPA with SCCs; PCI DSS Level 1
Resend	Data processor	Email address, user name	United States	DPA

Sub-processor chain: SoDNAscan (controller) → Anthropic/Supabase/Stripe (processors) → AWS/GCP (infrastructure sub-processors). Under GDPR Article 28, SoDNAscan remains fully liable for the data protection obligations of all processors and sub-processors.

5. Legal Basis

5.1 Legal Basis for Processing

Data Category	Legal Basis
Genetic data, blood work, wearable data, self-reported health information	Art. 9(2)(a) — Explicit consent of the data subject
Account data, demographic data	Art. 6(1)(b) — Performance of contract
Payment data	Art. 6(1)(b) — Performance of contract
Server logs	Art. 6(1)(f) — Legitimate interest (security)
Consent records	Art. 6(1)(c) — Legal obligation

5.2 Consent Mechanism

Consent for processing special category data is obtained through:

Signup flow: Separate, explicit checkbox for health data processing consent with a direct link to the Data Use Policy. This is distinct from the Terms of Service acceptance and disclaimer acknowledgment.
Consent records: Each consent event (granted or withdrawn) is recorded with timestamp, policy version, and IP address in an immutable audit trail.
Withdrawal: Users can withdraw AI processing consent at any time via account settings. Withdrawal immediately blocks new AI processing but does not affect previously generated content.
Consent gating: Backend endpoints that trigger AI processing enforce active consent — requests are rejected if consent is not currently granted.

5.3 Necessity and Proportionality

Necessity: AI processing of genetic and health data is the core function of the service. Users upload data specifically to receive AI-generated health analysis. The processing cannot be achieved by less intrusive means while delivering the same service.
Proportionality: Only health-relevant matched genetic variants are processed (not the full raw genotype file). Wearable data is aggregated before AI processing (raw granular data is deleted). Free-text fields are sanitized and length-limited before AI transmission. Blood work data requires user confirmation before entering AI analysis.
Data minimisation: Unmatched SNPs are discarded. Raw wearable files are deleted after parsing. Profile text fields are truncated to defined maximum lengths (1,000–2,000 characters). Only data relevant to health analysis is included in AI prompts.

6. Risk Assessment

6.1 Identified Risks

#	Risk	Likelihood	Severity	Overall Risk
R1	Unauthorized access to genetic data through application breach	Low	Very High	High
R2	Unauthorized access to genetic data through processor breach (Anthropic, Supabase)	Low	Very High	High
R3	AI generates inaccurate health information that users act upon	Medium	High	High
R4	Cross-border transfer exposes genetic data to US government access requests	Low	High	Medium
R5	Re-identification of anonymized genetic data	Very Low	Very High	Medium
R6	Genetic data reveals information about non-consenting family members	Medium	Medium	Medium
R7	AI prompt injection via user-provided text fields	Low	Medium	Low
R8	Consent is not sufficiently informed or specific for Art. 9 data	Low	High	Medium
R9	Data retained longer than necessary	Low	Medium	Low
R10	Sub-processor processes data beyond authorized scope	Very Low	High	Low

6.2 Severity Criteria

Very High: Irreversible harm; genetic data cannot be changed like a password. A breach creates permanent exposure potentially affecting biological relatives.
High: Significant harm to health decisions, financial standing, or privacy.
Medium: Moderate inconvenience or limited privacy impact.
Low: Minimal or easily remediated impact.

7. Risk Mitigation Measures

7.1 R1 — Application Breach

Measure	Implementation
Authentication security	JWT with ES256 algorithm, verified against Supabase JWKS. Refresh tokens in httpOnly cookies (invisible to JavaScript). Access tokens in memory only (not localStorage).
CSRF protection	Custom header required on auth endpoints
Row-Level Security	RLS enabled on all database tables. All application queries filter by authenticated user ID.
Security headers	HSTS, X-Frame-Options DENY, X-Content-Type-Options nosniff, strict Referrer-Policy, Content Security Policy
Rate limiting	Global rate limits plus per-endpoint overrides for auth and upload endpoints
File validation	Magic byte verification, extension whitelisting, size limits, format-specific content validation
Input sanitization	Free-text fields stripped of special characters before AI prompt interpolation
No third-party tracking	Self-hosted cookie-free analytics only (no personal data collected). No third-party analytics scripts, no tracking cookies, no advertising SDKs.

7.2 R2 — Processor Breach

Measure	Implementation
Anthropic retention limit	7-day retention window, then automatic deletion. No model training on API data.
Anthropic DPA	Data Processing Addendum with SCCs incorporated in commercial terms
Supabase encryption	Database and storage encrypted at rest (AES-256). EU West Frankfurt deployment. DPA with SCCs.
Stripe isolation	Payment processor receives email and user ID only — no genetic, health, or biometric data. PCI DSS Level 1.
Encryption in transit	All data transmission uses TLS 1.2 or higher

7.3 R3 — Inaccurate AI Output

Measure	Implementation
Disclaimer framework	Medical & Wellness Disclaimer required at signup. "Not medical advice" disclosures in Health Book content and Data Use Policy.
Validation pipeline	Rule-based validator checks SNP coverage, allele consistency, confidence distribution, and supplement inference chain lengths. Semantic validator checks cross-report consistency.
Evidence-tier system	SNP reference database includes evidence tiers and confidence scores. AI system prompt requires citing evidence quality for each finding.
Blood work user verification	Extracted biomarkers must be explicitly confirmed by the user before entering AI analysis
Human oversight disclosure	Data Use Policy Section 10 clearly states that AI outputs are not reviewed by medical professionals

7.4 R4 — Cross-Border Transfer Risk

Measure	Implementation
Standard Contractual Clauses	SCCs in place with Anthropic, Supabase, and Stripe
Transfer Impact Assessment	Separate TIA conducted. Available upon request.
Encryption	TLS 1.2+ in transit. AES-256 at rest.
Retention limitation	7-day retention at Anthropic limits the exposure window
EU data residency	Supabase deployed in EU West Frankfurt — stored data does not leave the EU

7.5 R5 — Re-identification Risk

Measure	Implementation
No data sharing	Genetic data is never sold, shared, or combined across users
No public datasets	Generated health books are private to the user
Access isolation	RLS and per-user query filtering prevent any cross-user data access

7.6 R6 — Family Member Privacy

Measure	Implementation
User notice	Privacy Policy Section 12 explicitly informs users that genetic data reveals information about biological relatives
Data subject scope	Only the uploading individual's data is processed; no family member data is collected or inferred
No familial matching	The service does not perform relative matching, ancestry tracing, or cross-user genetic comparison

7.7 R7 — Prompt Injection

Measure	Implementation
Input sanitization	All free-text profile fields are stripped of special characters before inclusion in AI prompts
Field length limits	Free-text fields truncated to 1,000–2,000 characters
XML containment	Blood work report text is wrapped in containment tags with system instructions to treat contents as raw data

7.8 R8 — Consent Quality

Measure	Implementation
Separate consent	Health data processing consent is a standalone checkbox, distinct from Terms of Service and disclaimer
Linked policy	Consent checkbox links directly to the Data Use Policy
Consent audit trail	Immutable records: consent type, granted/withdrawn, timestamp, policy version, IP address
Withdrawal mechanism	Settings page toggle for immediate consent withdrawal; backend enforces at the endpoint level
Re-consent on changes	Privacy Policy commits to requesting renewed consent if processing changes materially affect genetic or health data

7.9 R9 — Data Retention

Measure	Implementation
Wearable data minimisation	Raw wearable files are deleted immediately after parsing. Only aggregated metrics are retained.
Account deletion cascade	Full cascading deletion of all user data across all tables and storage buckets
Anthropic auto-deletion	7-day retention window with automatic deletion
Defined retention schedule	Retention periods documented in Privacy Policy Section 8 for all data categories

7.10 R10 — Sub-Processor Scope Creep

Measure	Implementation
DPA terms	Each processor is bound by a DPA specifying permitted processing purposes
No-training guarantee	Anthropic's commercial terms prohibit model training on API data
Periodic review	Processor terms and sub-processor lists to be reviewed annually
Transparency	Sub-processor chain is disclosed in Privacy Policy Section 6

8. Residual Risks

After implementing the mitigation measures above, the following residual risks remain:

Risk	Residual Level	Justification
R1 — Application breach	Low	Standard security controls in place; no system is immune to zero-day vulnerabilities
R2 — Processor breach	Low	Mitigated by DPAs, encryption, and retention limits; residual risk inherent in any cloud processing
R3 — Inaccurate AI output	Medium	AI model limitations are inherent. Mitigated by disclaimers, validation, and user responsibility disclosure. Cannot be fully eliminated.
R4 — Cross-border transfer	Low	SCCs and encryption provide adequate safeguards under current CJEU jurisprudence (Schrems II)
R6 — Family implications	Medium	Inherent to genetic data. Adequately disclosed to users. Cannot be technically eliminated.

No residual risk is assessed as high after mitigation. Processing may proceed.

9. Consultation

9.1 Supervisory Authority

GDPR Article 36 requires prior consultation with the supervisory authority if the DPIA indicates that processing would result in a high risk that the controller cannot mitigate. Based on this assessment, residual risks have been mitigated to acceptable levels and prior consultation is not required.

If the risk profile changes materially (e.g., new processing activities, changes to Anthropic's retention terms, or expansion of data categories), this assessment will be re-evaluated and prior consultation will be sought if necessary.

9.2 Data Subjects

Users are informed of the processing through:

Privacy Policy — comprehensive data processing disclosure
Data Use Policy — specific AI processing disclosure
Consumer Health Data Privacy Policy — Washington MHMDA compliance
Explicit consent mechanism at signup and in account settings
This DPIA is available upon request at privacy@sodnascan.com

10. Review Schedule

This DPIA will be reviewed and updated:

Annually as a standing compliance activity
When processing activities change (e.g., new data categories, new AI models, new processors)
When Anthropic's terms change (e.g., retention period modifications, sub-processor additions)
When relevant regulations change (e.g., EU AI Act enforcement milestones, new EDPB guidance on AI and genetic data)
After any personal data breach involving genetic or health data

11. Conclusion

This DPIA confirms that SoDNAscan's processing of genetic and health data using AI:

Has a clearly defined and legitimate purpose — delivering personalized wellness insights based on users' own uploaded data
Is based on explicit consent (GDPR Art. 9(2)(a)) obtained through a separate, informed, and recorded consent mechanism
Is necessary and proportionate — the processing cannot be achieved by less intrusive means while delivering the same service, and data minimisation measures are implemented throughout the pipeline
Has identified and mitigated risks to an acceptable level through technical and organizational measures
Involves documented processor relationships with DPAs and SCCs in place for all cross-border transfers
Will be reviewed regularly and updated when processing activities, processor terms, or regulatory requirements change

Processing may proceed subject to ongoing compliance monitoring and the review schedule above.

12. Contact

For questions about this DPIA or to request the full document:

Email: info@sodnascan.com
Data Protection Contact: privacy@sodnascan.com