Data Engineer - ETL/Data Pipeline

Mountain View, California


Lark is the world's largest A.I. healthcare provider, servicing more than a million patients suffering from, or at risk of, chronic disease with A.I. Nurses. We’re on a mission to improve people’s health and happiness through our digital health coach. We are the only A.I. nurse ever to become fully medically reimbursed to 100% replace a live nurse because we achieved equivalent health outcomes to live healthcare professionals - which allows for infinitely scalable healthcare. Since launch, Lark has continued to receive awards and accolades for both our product, and our leadership, including:

  • Apple's Top 10 Apps in the World
  • Business Insider's most innovative companies in the world along with Uber and Snapchat
  • A CEO who was recognized as the #1 in Top 10 Women in Tech to Watch by Inc. 
  • CDC recognition of our Diabetes Prevention Program.



What You'll Do:

Duties & Responsibilities

  • You'll be joining a small, agile team that is fairly early in its data-engineering journey, and have a tremendous opportunity to make a big impact.
  • Design and build data infrastructure with efficiency, reliability, and consistency to meet rapidly growing data needs .
  • Design data pipelines and data integrations to collect, clean, and store large datasets (streaming and batch).
  • Maintain the privacy of our users and partners by helping to ensure best practices in security and data handling continue to be used as we grow.
  • Help establish and maintain a high-level of operational excellence in data engineering
  • Collaborate with teams across the company to help develop data products that drive company success.
  • Evaluate, integrate, and build tools to help accelerate Data Engineering, Data Science, Business Intelligence, Reporting, and Analytics as our needs evolve.
  • Drive data literacy across business functions


What You’ll Need:

Knowledge & Skills

  • A love of data, and the make-or-break effect it has on startups.
  • A love of helping people use data effectively.
  • Knowledge of different database technologies, their tradeoffs, and how to make the best use of each.
  • The desire to learn and mentor in a collaborative team environment.
  • Humility with an intrinsic positive drive
  • Passion for developing a world-class engineering culture
  • Value, respect, and an enthusiasm for diversity, inclusion, and alternative perspectives
  • A constant desire to create an environment of psychological safety, willingness to ask questions that might seem “simple”, and break down a problem to its smallest parts - and a desireto help others around you learn and grow
  • Excellence in effective communication (spoken and written!) across a range of audiences
  • Ability to thrive in an environment that promotes and enables collaboration

Credentials & Experience:

  • 3+ years of experience in data engineering or equivalent knowledge and ability.
  • 5+ years of experience in software engineering overall, or equivalent knowledge and ability
  • BS or MS in Computer Science, Mathematics, Computer Engineering or related field, or equivalent experience, knowledge, and ability.
  • Experience designing and maintaining at least one type of database (object, columnar, in-memory, relational)
  • Experience with relational, object, tabular, key-value, triple-store, tuple-store, and related database types.
  • Experience in data warehouse modernization, building data-marts, star/snowflake schema designs, infrastructure components, ETL/ELT pipelines, and BI/reporting/analytic tools
  • Experience building production-grade data backup/restore strategies, and disaster recovery solutions
  • Extensive hands-on experience with batch and stream data processing (e.g., DMS, Flink, Spark, Kinesis, Kafka)
  • Advanced SQL skills, and strong proficiency in at least two of the following programming languages: Python, Scala, and Java.
  • Familiarity with pandas, SciPy, scikit-learn, seaborn, SparkML
  • Demonstrated expertise and fluency in Object Oriented and/or Functional programming. (solid grasp of common design patterns, idioms, and design)
  • Demonstrated proficiency and expertise with data structures, algorithms, distributed computing, storage systems, and assorted consistency models.
  • Bringing Machine Learning to production at scale.
  • Excellent cross-functional collaboration and communication skills

Bonus points for familiarity with the following key technologies:

  • Spark (or Storm/Flink/MapReduce/Impala/Hive)
  • SparkML (or scikit-learn/TensorFlow/etc.)
  • GraphX (or TitanDB/neo4j/range++/graph engine/orientdb/etc.)
  • Delta Lake
  • Airflow (or luigi/oozie/azkaban/pinball/chronos/etc)
  • AWS (including EMR, DMS, Athena, RDS, Aurora, Lambda, Redshift, etc.)
  • Snowflake
  • Hadoop Ecosystem (MapReduce/Yarn/HDFS/Pig/Hive/etc.)
  • At least one of Periscope[Sisense]/Tableau/Domo/Looker/Superset/etc.


Our team works with cutting edge tools and technology related to Artificial Intelligence and Machine Learning. We are using NLP to process millions of meals, and accelerometer data to compute activity and sleep amounts from users' phones. Our chat A.I. is the most sophisticated digital health engagement tool in the world. Join us and make it even better!

Lark is an Equal Opportunity Employer. Lark does not discriminate on the basis of race, religion, color, sex, gender identity, sexual orientation, age, non-disqualifying physical or mental disability, national origin, veteran status or any other basis covered by appropriate law. All employment is decided on the basis of qualifications, merit, and business need.

Get weekly notifications when new jobs are posted