Last Updated: March 1, 2026
Data Protection Impact Assessment (DPIA)
1. Introduction
This Data Protection Impact Assessment (DPIA) is conducted in accordance with GDPR Article 35 to evaluate the data protection risks arising from SoDNAscan's processing of genetic, health, and biometric data using artificial intelligence.
This assessment is maintained by the data controller and is available upon request to supervisory authorities and, in summary form, to data subjects upon request at info@sodnascan.com.
This document should be read together with our:
2. Data Controller
- Controller: Samuel Virag
- Contact Email: info@sodnascan.com
- Data Protection Contact: privacy@sodnascan.com
No Data Protection Officer (DPO) has been formally appointed. Under GDPR Article 37, a DPO is required for public authorities or organizations whose core activities require large-scale, regular, and systematic monitoring or large-scale processing of special category data. SoDNAscan will appoint a DPO if processing scale reaches a threshold that triggers this obligation.
3. DPIA Triggers
Three independent criteria under GDPR Article 35 and the European Data Protection Board (EDPB) Guidelines on Data Protection Impact Assessment (WP 248 rev.01) trigger the requirement for this DPIA. Any two are sufficient; SoDNAscan meets all three:
| # | Trigger | Applicable GDPR Provision | How SoDNAscan Meets It |
|---|---|---|---|
| 1 | Processing special category data at scale | Art. 35(3)(b), Art. 9 | Genetic data (DNA genotypes) and health data (blood biomarkers, wearable metrics, health history) are special category data under Art. 9. Processing occurs systematically for all users who upload data. |
| 2 | Automated decision-making or profiling with significant effect | Art. 35(3)(a), Art. 22 | AI-powered analysis generates personalized health assessments, risk profiles, supplement protocols, and monitoring plans based on genetic and health data. While outputs are informational (not clinical decisions), they may influence users' health behaviors. |
| 3 | Use of new technologies | EDPB Guidelines, Criterion 8 | Large language models (LLMs) processing genetic data for individualized health analysis represents a novel application of AI technology to special category data. |
4. Description of Processing
4.1 Purpose
SoDNAscan generates personalized health books by analyzing users' genetic data, blood work results, and wearable health metrics using AI. The service provides wellness insights and educational health information — not medical diagnoses or clinical recommendations.
4.2 Data Subjects
Adult individuals (18+) who voluntarily create an account, upload their genetic data, and consent to AI-powered analysis.
4.3 Categories of Personal Data Processed
| Data Category | GDPR Classification | Source |
|---|---|---|
| Account data (email, name, password hash) | Ordinary personal data | User-provided at registration |
| Demographic data (age, sex, height, weight, ethnicity) | Ordinary personal data | User-provided in profile |
| Genetic data (SNP genotypes — rsid, chromosome, position, alleles) | Special category data — Art. 9 (genetic data) | Uploaded DNA file |
| Blood work results (biomarker names, values, units, reference ranges, status flags) | Special category data — Art. 9 (health data) | Uploaded PDF or pasted text |
| Wearable health metrics (heart rate, HRV, SpO2, sleep, activity, body composition) | Special category data — Art. 9 (health data) | Uploaded Apple Health, Oura, Fitbit, or Whoop export |
| Self-reported health information (health history, family history, goals, supplements, lifestyle) | Special category data — Art. 9 (health data) | User-provided free-text fields |
| AI-generated health analysis (reports, chapters, fact sheets) | Derived special category data | Generated by AI processing |
| Payment data (Stripe session ID, payment intent, amount, currency) | Ordinary personal data | Stripe checkout |
| Consent records (consent type, granted/withdrawn, timestamp, policy version, IP address) | Ordinary personal data | System-generated |
4.4 Processing Activities
The processing pipeline consists of four sequential stages:
Stage 1 — Data Ingestion
- User uploads a genetic file (max 50 MB)
- File is parsed to extract individual SNP genotypes (rsid, chromosome, position, alleles)
- Parsed SNPs are matched against a curated reference database of health-relevant genetic variants
- Only matched variants (typically several hundred out of ~700,000 total SNPs) proceed to analysis
- Unmatched SNPs are not stored
- Raw genetic file is stored in encrypted cloud storage for potential re-processing
Stage 2 — Supplementary Data (Optional)
- Blood work: User uploads a PDF lab report or pastes text. The AI extracts structured biomarker data. User reviews and explicitly confirms extracted values before they enter analysis.
- Wearable data: User uploads a health export file. Data is parsed locally (no AI involved). Per-metric aggregates (averages, min/max) are computed. The raw wearable file is deleted immediately after parsing (data minimisation by design). Only aggregated metrics are retained.
Stage 3 — AI Analysis and Book Generation
- Matched genetic variants, confirmed blood biomarkers, aggregated wearable metrics, and sanitized profile fields are assembled into structured prompts
- Data is transmitted via encrypted HTTPS to Anthropic's Claude
- The AI generates 10 analytical reports covering different biological systems
- Reports are validated for internal consistency
- Up to 20 book chapters are generated from the reports
- Chapters are rendered into a PDF health book
Stage 4 — Delivery and Storage
- The PDF is stored in encrypted cloud storage
- User can view the book online, download the PDF, and browse individual reports
- All generated content is retained until the user deletes their account
4.5 Recipients and Processors
| Recipient | Role | Data Received | Location | DPA/SCCs |
|---|---|---|---|---|
| Anthropic, PBC | Data processor | Matched genetic variants, confirmed blood biomarkers, aggregated wearable metrics, sanitized profile fields | United States (AWS/GCP) | DPA with SCCs in commercial API terms |
| Supabase, Inc. | Data processor | All stored data (database, files, auth) | EU West — Frankfurt | DPA with SCCs (signed via PandaDoc) |
| Stripe, Inc. | Data processor | Email, user ID, payment amount, currency | Global (including US) | DPA with SCCs; PCI DSS Level 1 |
| Resend | Data processor | Email address, user name | United States | DPA |
Sub-processor chain: SoDNAscan (controller) → Anthropic/Supabase/Stripe (processors) → AWS/GCP (infrastructure sub-processors). Under GDPR Article 28, SoDNAscan remains fully liable for the data protection obligations of all processors and sub-processors.
5. Legal Basis
5.1 Legal Basis for Processing
| Data Category | Legal Basis |
|---|---|
| Genetic data, blood work, wearable data, self-reported health information | Art. 9(2)(a) — Explicit consent of the data subject |
| Account data, demographic data | Art. 6(1)(b) — Performance of contract |
| Payment data | Art. 6(1)(b) — Performance of contract |
| Server logs | Art. 6(1)(f) — Legitimate interest (security) |
| Consent records | Art. 6(1)(c) — Legal obligation |
5.2 Consent Mechanism
Consent for processing special category data is obtained through:
- Signup flow: Separate, explicit checkbox for health data processing consent with a direct link to the Data Use Policy. This is distinct from the Terms of Service acceptance and disclaimer acknowledgment.
- Consent records: Each consent event (granted or withdrawn) is recorded with timestamp, policy version, and IP address in an immutable audit trail.
- Withdrawal: Users can withdraw AI processing consent at any time via account settings. Withdrawal immediately blocks new AI processing but does not affect previously generated content.
- Consent gating: Backend endpoints that trigger AI processing enforce active consent — requests are rejected if consent is not currently granted.
5.3 Necessity and Proportionality
- Necessity: AI processing of genetic and health data is the core function of the service. Users upload data specifically to receive AI-generated health analysis. The processing cannot be achieved by less intrusive means while delivering the same service.
- Proportionality: Only health-relevant matched genetic variants are processed (not the full raw genotype file). Wearable data is aggregated before AI processing (raw granular data is deleted). Free-text fields are sanitized and length-limited before AI transmission. Blood work data requires user confirmation before entering AI analysis.
- Data minimisation: Unmatched SNPs are discarded. Raw wearable files are deleted after parsing. Profile text fields are truncated to defined maximum lengths (1,000–2,000 characters). Only data relevant to health analysis is included in AI prompts.
6. Risk Assessment
6.1 Identified Risks
| # | Risk | Likelihood | Severity | Overall Risk |
|---|---|---|---|---|
| R1 | Unauthorized access to genetic data through application breach | Low | Very High | High |
| R2 | Unauthorized access to genetic data through processor breach (Anthropic, Supabase) | Low | Very High | High |
| R3 | AI generates inaccurate health information that users act upon | Medium | High | High |
| R4 | Cross-border transfer exposes genetic data to US government access requests | Low | High | Medium |
| R5 | Re-identification of anonymized genetic data | Very Low | Very High | Medium |
| R6 | Genetic data reveals information about non-consenting family members | Medium | Medium | Medium |
| R7 | AI prompt injection via user-provided text fields | Low | Medium | Low |
| R8 | Consent is not sufficiently informed or specific for Art. 9 data | Low | High | Medium |
| R9 | Data retained longer than necessary | Low | Medium | Low |
| R10 | Sub-processor processes data beyond authorized scope | Very Low | High | Low |
6.2 Severity Criteria
- Very High: Irreversible harm; genetic data cannot be changed like a password. A breach creates permanent exposure potentially affecting biological relatives.
- High: Significant harm to health decisions, financial standing, or privacy.
- Medium: Moderate inconvenience or limited privacy impact.
- Low: Minimal or easily remediated impact.
7. Risk Mitigation Measures
7.1 R1 — Application Breach
| Measure | Implementation |
|---|---|
| Authentication security | JWT with ES256 algorithm, verified against Supabase JWKS. Refresh tokens in httpOnly cookies (invisible to JavaScript). Access tokens in memory only (not localStorage). |
| CSRF protection | Custom header required on auth endpoints |
| Row-Level Security | RLS enabled on all database tables. All application queries filter by authenticated user ID. |
| Security headers | HSTS, X-Frame-Options DENY, X-Content-Type-Options nosniff, strict Referrer-Policy, Content Security Policy |
| Rate limiting | Global rate limits plus per-endpoint overrides for auth and upload endpoints |
| File validation | Magic byte verification, extension whitelisting, size limits, format-specific content validation |
| Input sanitization | Free-text fields stripped of special characters before AI prompt interpolation |
| No third-party tracking | Self-hosted cookie-free analytics only (no personal data collected). No third-party analytics scripts, no tracking cookies, no advertising SDKs. |
7.2 R2 — Processor Breach
| Measure | Implementation |
|---|---|
| Anthropic retention limit | 7-day retention window, then automatic deletion. No model training on API data. |
| Anthropic DPA | Data Processing Addendum with SCCs incorporated in commercial terms |
| Supabase encryption | Database and storage encrypted at rest (AES-256). EU West Frankfurt deployment. DPA with SCCs. |
| Stripe isolation | Payment processor receives email and user ID only — no genetic, health, or biometric data. PCI DSS Level 1. |
| Encryption in transit | All data transmission uses TLS 1.2 or higher |
7.3 R3 — Inaccurate AI Output
| Measure | Implementation |
|---|---|
| Disclaimer framework | Medical & Wellness Disclaimer required at signup. "Not medical advice" disclosures in Health Book content and Data Use Policy. |
| Validation pipeline | Rule-based validator checks SNP coverage, allele consistency, confidence distribution, and supplement inference chain lengths. Semantic validator checks cross-report consistency. |
| Evidence-tier system | SNP reference database includes evidence tiers and confidence scores. AI system prompt requires citing evidence quality for each finding. |
| Blood work user verification | Extracted biomarkers must be explicitly confirmed by the user before entering AI analysis |
| Human oversight disclosure | Data Use Policy Section 10 clearly states that AI outputs are not reviewed by medical professionals |
7.4 R4 — Cross-Border Transfer Risk
| Measure | Implementation |
|---|---|
| Standard Contractual Clauses | SCCs in place with Anthropic, Supabase, and Stripe |
| Transfer Impact Assessment | Separate TIA conducted. Available upon request. |
| Encryption | TLS 1.2+ in transit. AES-256 at rest. |
| Retention limitation | 7-day retention at Anthropic limits the exposure window |
| EU data residency | Supabase deployed in EU West Frankfurt — stored data does not leave the EU |
7.5 R5 — Re-identification Risk
| Measure | Implementation |
|---|---|
| No data sharing | Genetic data is never sold, shared, or combined across users |
| No public datasets | Generated health books are private to the user |
| Access isolation | RLS and per-user query filtering prevent any cross-user data access |
7.6 R6 — Family Member Privacy
| Measure | Implementation |
|---|---|
| User notice | Privacy Policy Section 12 explicitly informs users that genetic data reveals information about biological relatives |
| Data subject scope | Only the uploading individual's data is processed; no family member data is collected or inferred |
| No familial matching | The service does not perform relative matching, ancestry tracing, or cross-user genetic comparison |
7.7 R7 — Prompt Injection
| Measure | Implementation |
|---|---|
| Input sanitization | All free-text profile fields are stripped of special characters before inclusion in AI prompts |
| Field length limits | Free-text fields truncated to 1,000–2,000 characters |
| XML containment | Blood work report text is wrapped in containment tags with system instructions to treat contents as raw data |
7.8 R8 — Consent Quality
| Measure | Implementation |
|---|---|
| Separate consent | Health data processing consent is a standalone checkbox, distinct from Terms of Service and disclaimer |
| Linked policy | Consent checkbox links directly to the Data Use Policy |
| Consent audit trail | Immutable records: consent type, granted/withdrawn, timestamp, policy version, IP address |
| Withdrawal mechanism | Settings page toggle for immediate consent withdrawal; backend enforces at the endpoint level |
| Re-consent on changes | Privacy Policy commits to requesting renewed consent if processing changes materially affect genetic or health data |
7.9 R9 — Data Retention
| Measure | Implementation |
|---|---|
| Wearable data minimisation | Raw wearable files are deleted immediately after parsing. Only aggregated metrics are retained. |
| Account deletion cascade | Full cascading deletion of all user data across all tables and storage buckets |
| Anthropic auto-deletion | 7-day retention window with automatic deletion |
| Defined retention schedule | Retention periods documented in Privacy Policy Section 8 for all data categories |
7.10 R10 — Sub-Processor Scope Creep
| Measure | Implementation |
|---|---|
| DPA terms | Each processor is bound by a DPA specifying permitted processing purposes |
| No-training guarantee | Anthropic's commercial terms prohibit model training on API data |
| Periodic review | Processor terms and sub-processor lists to be reviewed annually |
| Transparency | Sub-processor chain is disclosed in Privacy Policy Section 6 |
8. Residual Risks
After implementing the mitigation measures above, the following residual risks remain:
| Risk | Residual Level | Justification |
|---|---|---|
| R1 — Application breach | Low | Standard security controls in place; no system is immune to zero-day vulnerabilities |
| R2 — Processor breach | Low | Mitigated by DPAs, encryption, and retention limits; residual risk inherent in any cloud processing |
| R3 — Inaccurate AI output | Medium | AI model limitations are inherent. Mitigated by disclaimers, validation, and user responsibility disclosure. Cannot be fully eliminated. |
| R4 — Cross-border transfer | Low | SCCs and encryption provide adequate safeguards under current CJEU jurisprudence (Schrems II) |
| R6 — Family implications | Medium | Inherent to genetic data. Adequately disclosed to users. Cannot be technically eliminated. |
No residual risk is assessed as high after mitigation. Processing may proceed.
9. Consultation
9.1 Supervisory Authority
GDPR Article 36 requires prior consultation with the supervisory authority if the DPIA indicates that processing would result in a high risk that the controller cannot mitigate. Based on this assessment, residual risks have been mitigated to acceptable levels and prior consultation is not required.
If the risk profile changes materially (e.g., new processing activities, changes to Anthropic's retention terms, or expansion of data categories), this assessment will be re-evaluated and prior consultation will be sought if necessary.
9.2 Data Subjects
Users are informed of the processing through:
- Privacy Policy — comprehensive data processing disclosure
- Data Use Policy — specific AI processing disclosure
- Consumer Health Data Privacy Policy — Washington MHMDA compliance
- Explicit consent mechanism at signup and in account settings
- This DPIA is available upon request at privacy@sodnascan.com
10. Review Schedule
This DPIA will be reviewed and updated:
- Annually as a standing compliance activity
- When processing activities change (e.g., new data categories, new AI models, new processors)
- When Anthropic's terms change (e.g., retention period modifications, sub-processor additions)
- When relevant regulations change (e.g., EU AI Act enforcement milestones, new EDPB guidance on AI and genetic data)
- After any personal data breach involving genetic or health data
11. Conclusion
This DPIA confirms that SoDNAscan's processing of genetic and health data using AI:
- Has a clearly defined and legitimate purpose — delivering personalized wellness insights based on users' own uploaded data
- Is based on explicit consent (GDPR Art. 9(2)(a)) obtained through a separate, informed, and recorded consent mechanism
- Is necessary and proportionate — the processing cannot be achieved by less intrusive means while delivering the same service, and data minimisation measures are implemented throughout the pipeline
- Has identified and mitigated risks to an acceptable level through technical and organizational measures
- Involves documented processor relationships with DPAs and SCCs in place for all cross-border transfers
- Will be reviewed regularly and updated when processing activities, processor terms, or regulatory requirements change
Processing may proceed subject to ongoing compliance monitoring and the review schedule above.
12. Contact
For questions about this DPIA or to request the full document:
- Email: info@sodnascan.com
- Data Protection Contact: privacy@sodnascan.com