Annarabic


Software Engineering
Fall 2024


I worked as a software engineer intern at Annarabic, a Moroccan AI startup developing speech-recognition systems for spoken Arabic dialects and low-resource African languages. Annarabic trains its models from scratch using native-speaker data, addressing critical gaps left by mainstream ASR systems trained primarily on Modern Standard Arabic.

Software Engineering & Data Infrastructure

My core responsibility was building a scalable infrastructure to support low-resource NLP model training, focusing on Swahili and dialectal Arabic.

  • I designed and optimized a Python web scraper using Selenium and MongoDB to collect and structure large-scale YouTube metadata and transcripts. From the initial iteration, I improved processing latency by ~4.5× and scaled the system to handle 10,000+ hours of video data.
  • I also conducted bottleneck analysis across scraping, parsing, and storage layers, and scoped targeted optimizations to reduce system overhead and improve reliability.
  • To plan for future work, I wrote a product specification document outlining architectural improvements, including the use of cloud services for distributed training and multiprocessing.

Throughout this process, I was challenged to balance speed, cost, and quality under the constraints of a small startup, given the need to avoid paid APIs.

Research, Product, & Partnerships

In addition to engineering, I worked closely with Annarabic’s founders on product development and developing external partnerships.

  • Showcased Annarabic’s “Transcribing WhatsApp Arabic Voice Messages” system at the Columbia Data Science Institute Undergraduate Research Fair 2024, demonstrating real-world applications of speech recognition disaster response.
  • Supported partnership discussions with Columbia DESDR research group to expand Annarabic’s operations into three additional countries (pending USAID support).

Takeaways

This internship provided hands-on experience with a very early-stage startup environment, from building cost-efficient data pipelines to translating technical capabilities into real impact. Working on mission-driven AI at a small company gave me ownership over both technical decisions and their downstream societal effects, particularly in advancing language equity and accessibility in underrepresented regions.


Thank you for checking out my page!

© Dylan Tran 2026.