Scalable Cloud Platform for Medical Diagnostics

Overview

Developed a production-grade cloud platform that enabled a medical diagnostics company to deliver an ML-based cervical cancer detection service to hospitals and laboratories. The client had a validated research model but lacked the infrastructure required to securely serve predictions, manage medical data, and onboard customers at scale.

The solution transformed the model into a reliable, scalable service by establishing a secure cloud foundation, standardized environments, and reusable deployment blueprints. This allowed healthcare providers to submit DICOM files, receive predictions safely, and continuously expand data labeling workflows to improve model accuracy over time.

As a result, the platform unblocked the client’s ability to onboard hospitals, operate in regulated environments, and begin generating revenue, while supporting earlier detection of cervical cancer, directly contributing to better patient outcomes.

Architecture diagram for scalable cloud platform

Architecture diagram for medical diagnosis

My Role

I led the design and implementation of the cloud foundations end to end, working closely with the client to turn a research prototype into a production-ready platform. In addition to building the infrastructure, I ran hands-on workshops to explain each layer of the cloud setup, guide architectural decisions, and help the team make informed trade-offs based on regulatory and operational needs. The tech stack included Terraform, Google Cloud Platform, Bash, Java, and Jenkins.

Tech Stack & Architecture Summary

Cloud & Platform: Google Cloud Platform (GCP)
Infrastructure as Code: Terraform with remote state stored in GCS
Compute & Environments: Isolated Windows Server VMs, one per hospital or laboratory
Data & ML Operations: GCP DICOM Store, BigQuery for labeling metadata
Automation & Provisioning: Terraform modules, startup scripts, Jenkins for in-VM automation
Security & Governance: IAM-based isolation, per-tenant service accounts, centralized logging and monitoring
Operational Model: Secure-by-default central platform with isolated, repeatable environments for clinical data processing

Situation & Challenge

When I joined the project, the company had already developed a promising machine learning model for cervical cancer detection and was in active conversations with hospitals and laboratories. However, despite strong interest, the solution was effectively blocked from real-world adoption. The model lived in a research environment, running on a single on-premises server with limited resources, and was unable to support even a small number of external users reliably.

The lack of cloud infrastructure and operational foundations made the system fragile and difficult to scale. Onboarding a new laboratory required manual steps, ad-hoc configuration, and close intervention from the team, only to result in poor performance and an unreliable experience. Early users struggled to consume the service, and some stopped using it altogether due to frequent freezes and limited capacity.

At the same time, the core team consisted primarily of data scientists with deep domain expertise but little experience in cloud platforms, infrastructure automation, or operational best practices. This made it difficult to design a production-ready solution without external platform expertise and guidance. Introducing cloud infrastructure also came with additional responsibility around secure data handling, environment isolation, auditability, and minimizing operational overhead for a small team.

To move the product forward, the company needed more than just infrastructure. It required a scalable, secure-by-default platform that could be understood and operated by non-infrastructure specialists, support isolated environments for hospitals, and provide a clear path from research to production without increasing long-term complexity.

Solution

I designed a cloud-native platform that allowed the company to safely move from a single on-premise server to a scalable, production-ready architecture tailored for hospitals and medical labs. The core idea was a central platform for the client, with fully isolated environments for each hospital or lab, ensuring security, performance, and operational simplicity from day one.

I started by laying down strong cloud foundations in GCP. This included defining IAM users, groups, and service accounts with least-privilege access, setting organization policies, and structuring projects to clearly separate internal development from production. Logging and monitoring were centralized in dedicated projects using log sinks, providing full auditability and visibility across the platform. Networking followed a hub-and-spoke model, isolating internal environments from production while tightly controlling traffic through firewall rules, network tags, and Cloud NAT with IAM-authenticated tunnels.

Each hospital or lab was provisioned with its own dedicated Windows Server VM in the production project. These VMs were fully isolated from one another at both the network and identity level, preventing any lateral access between labs and ensuring that no customer could access training datasets or data from other organizations. Labs were only able to upload and label data, while hospitals could submit studies and receive predictions, with all operations logged by default.

To enable medical imaging workflows, I migrated approximately 40 TB of anonymized DICOM data into Google Cloud Healthcare DICOM stores. Anonymization was enforced by default, and new datasets followed the same guarantees. Labs interacted with the system through a DICOM viewer installed on their dedicated VM, allowing them to upload, visualize, and label images. Labeling metadata was stored in BigQuery, enabling downstream use by the data science team without exposing raw datasets.

Scalability was achieved through infrastructure automation. I built reusable Terraform modules that acted as blueprints for onboarding new hospitals or labs. Provisioning a new environment required only adding a small configuration block for a new module instance and running a Terraform plan and apply, with state safely stored in GCS. VM provisioning, security configuration, and software installation were fully automated via startup scripts, including the installation of Java, Jenkins, and the DICOM viewer. Once provisioning completed, no further manual setup was required.

Operational complexity was kept intentionally low to accommodate a team of data scientists with no DevOps background. Jenkins handled in-VM automation tasks, while logging and monitoring were enabled by default with preconfigured dashboards. The result was a repeatable, secure, and low-maintenance platform that the data science team could operate independently, while labs and hospitals interacted only with their isolated environments.

The solution was validated through iterative demos and hands-on use by the client. Once deployed, labs were finally able to use the platform reliably, performance issues disappeared, and onboarding new hospitals became a predictable and fast process, turning a stalled research prototype into a system that could be used in real clinical workflows.

Impact & Results

Unblocked the transition from research to production, enabling hospitals and labs to reliably use the ML model in real clinical workflows.
Enabled early detection of cervical cancer by making the model accessible to medical institutions, directly supporting better and earlier treatment decisions.
Eliminated performance bottlenecks by replacing a single on-premise server with isolated, cloud-based environments per hospital and lab.
Reduced onboarding of new hospitals and labs to a repeatable, low-risk process using reusable Terraform blueprints.
Allowed new clients to be onboarded with minimal effort, requiring only the provisioning of a new VM rather than manual setup.
Established a secure-by-default platform with isolated access, centralized logging, and full auditability for sensitive medical workflows.
Minimized operational overhead, allowing data scientists with no prior cloud experience to operate and extend the platform independently.
Enabled the client to scale adoption and begin generating revenue from a previously stalled research prototype.