Problem Formulation (Business problem to Data Science Problem), OKR Validation against statistical measures, Data Wrangling, Data Storytelling & Insight Generation, Problem Solving, Excel VBA, Data Curiosity, Technical Decision Making (How many iterations to go for vs when to stop iterating), Communication & Articulation: Vocal & Written, Business Acumen (Consume new domains quickly to learn through data), Design Thinking, Data Literacy
Job requirements
Job Title: Document AI Engineer ---
Key Responsibilities ·
Design, deploy and maintain automated invoice extraction pipelines using GCP Document AI.
· Develop custom model training workflows for documents with non-standard formats.
· Preprocess and upload document datasets to Cloud Storage.
· Label documents using DocAI Workbench or JSONL for training.
· Train and evaluate custom processors using AutoML or custom schema definitions.
· Integrate extracted data into downstream tools (e.g., BigQuery, ERP systems).
· Write robust, production-grade Python code for end-to-end orchestration.
· Maintain CI/CD deployment pipelines
· Ensure secure document handling and compliance with data policies. -
-- Required Skills & Experience
· Strong hands-on experience with Google Cloud Platform, especially: o Document AI o Cloud Storage o IAM o Vertex AI (preferred)
· Proficient in Python, including Google Cloud SDK libraries
· Familiarity with OCR, and schema-based information o Cloud Functions / Cloud Runextraction
· Understanding of security best practices for handling financial documents -
-- Preferred Qualifications ·
Previous projects involving invoice or document parsing
· Familiarity with BigQuery for analytics/reporting