We are looking for an experienced backend engineer to own, maintain and extend our media monitoring & social analytics service. This service ingests news and social media data from many relevant outlets and social media sources, processes their content using natural language processing (NLP) models and stores the enriched data for news feeds and deeper analysis.
The media analytics service is built in Python using spaCy on top of PostgreSQL & RabbitMQ and is deployed as a Kubernetes cluster of microservices (for independent reliability and scaling of the ingest, digest and API services).
Our media data is both real time and historical across a wide range of media sources (e.g., news articles, Tweets, Telegram messages), often covering the entire history of a source (going back almost a decade). This presents interesting challenges in ingestion, processing, storage & querying as well as valuable opportunities in extracting good signals from historic and current media.
The relevant NLP/ML architecture is already built out, tested, documented and working. You won’t need experience with ML but a willingness to learn the relevant tools and libraries to ensure its continued accuracy and accommodate new data & new features.
What you'll do
- Own and manage the architecture, implementation, deployment and use of the media analytics services within Messari and to outside consumers (via the web app or API)
- Maintain and extend the existing ingestion pipeline for web scraping & social media connections, adding new ingest sources as needed (forums, websites, outlets, etc.).
- Maintain and improve existing NLP models and digest pipeline, evaluate and improve performance on existing and new data.
- Extend the media API to add and update interfaces, exposing existing capabilities to the rest of our platform and external API users.
- Iterate on the data architecture to add substantial new features or improve query performance.
- If you’re up for it, build new or design our NLP models for better performance.
Who you are
- 2+ years experience with production backend development using reasonably modern technologies (like SQL, Git) and languages (like Python, Java, Golang).
- Ready to own and drive a project end-to-end, from taking into account internal and external stakeholders to defining, refining, planning, executing and maintaining a vision.
- Experienced with or interested in machine learning & data science, particularly in building data pipelines, improving language models and building internal data tools.
- Willing, capable and happy to get deep down into the unglamorous but essential details of operating a complex service (e.g., tuning SQL query performance, stabilizing data pipelines)
Projects you could work on
- Media ingestion: Extend our media ingestion pipelines to cover any crypto news/signal source you care about.
- Media analytics/digest: Take over and improve our text processing and machine learning infrastructure to get the most out of difficult input text for the media service.
- Internal media tools: Build and integrate online and historic media analytics tools for our internal research team (which may eventually make their way into external products).
What's it like to work with our engineering team?
- A welcoming and open environment with people who love to collaborate on ideas and tackle complex problems
- Work with a small team of engineers with large impact across product, research, and business development
- Participate in forums like Family Meals and Messari Lab to share ideas or technologies you’ve been exploring and tinkering with