
Google Maps Street View Extraction
Built ML pipelines for extracting street names, numbers, and traffic signs using detection, OCR, and semantic segmentation at global scale.
I am an applied research scientist with 20+ years of experience in computer vision, deep learning, and ML systems. I've worked across academia and industry at Google, Momenta, ENSCO, and UMD, delivering impactful solutions from conception to deployment.
I'm Xavier Gibert-Serra, Ph.D., a consultant specializing in machine learning and computer vision. Previously, I was a Staff R&D Engineer at Momenta Europe working on autonomous driving perception, and a Machine Learning Software Engineer at Google Maps and Google X Robotics. Earlier, at ENSCO I led vision R&D for railway inspection systems. I earned my Ph.D. in Electrical & Computer Engineering at the University of Maryland, advised by Rama Chellappa.
May 2023 β Jul 2025
Developed, trained, and deployed perception module updates for EU and US autonomous driving customers. Focused on 3D object detection, multi-sensor fusion, tracking, prediction, and data mining.
Sep 2015 β Apr 2023
Google Maps: Designed large-scale vision pipelines for extracting structured information from Street View using detection, segmentation, OCR, and bundle adjustment.
X Robotics: Developed real-time pose estimation and tracking algorithms for robotics applications using geometric techniques.
Sep 2011 β Sep 2015
Managed a federally funded project for railway defect detection. Built GPU-accelerated anomaly detection algorithms, distributed processing pipelines, and multi-modal medical image registration.
Apr 2004 β Apr 2013
ENSCO Rail: Led the Image Processing Group. Developed real-time algorithms for optical rail profile analysis and crack detection. Managed R&D and productization of the RailScan family of systems.
Team ENSCO β DARPA Grand Challenge: Built an obstacle detector using stereo cameras, enabling a robotic vehicle to autonomously drive 91 miles in desert terrain and finish sixth.
Sep 2001 β Dec 2003
Developed frameworks for feature extraction from multimedia streams, OCR evaluation, and classification using multiple modalities.
Built ML pipelines for extracting street names, numbers, and traffic signs using detection, OCR, and semantic segmentation at global scale.
Deployed real-time system for crack detection on moving trains using line-scan cameras, integrated into production inspection fleets.
Developed 3D object detection, fusion, and prediction modules for L2+/L3 autonomous driving stacks in Europe and the US.
Designed stereo-based obstacle detector enabling Team ENSCOβs vehicle to autonomously travel 91 miles and finish sixth overall.
Designed system for automatic inspection of railway components using deep learning, with crack detection, tie grading, and detection of missing and/or broken rail anchors.
Designed a multimodality cardiac display and analysis tool for the University of Maryland school of medicine.
Interested in collaborating? Email me or connect on LinkedIn.