In recent years Myanmar has seen digitization at an incredible rate. A few years ago, having a mobile phone was a luxury few could afford. Today, a large part of the population in Myanmar use mobile phones on a daily basis. This rapid change has opened up countless opportunities for innovations for technology.
Along with it, the country has seen an unprecedented uptake of Facebook, with many people considering the social network synonym to the internet. Along with this uptake, Myanmar has seen a spread of hate speech and misinformation on the social network.
To detect and effectively moderate instances of hate speech, social media companies, amongst other efforts, rely on Natural Language Processing (NLP) Algorithms that are able to bring suspicious content to the attention of human moderators.
While developing such algorithms is a difficult task in any language, tackling it in Myanmar languages comes with its own unique challenges. Firstly, most NLP research is done in the english language, making it difficult to apply the state of the art to the local context. Secondly, the lack of structured language and hate speech datasets is an impediment to training such algorithms.
KoeKoeTech is working hard to address these issues and push the envelope in Myanmar NLP. And is growing its team working on a project with the following three main objectives:
- Work closely with local and international NGOs, legal experts and technology leaders to develop best practices for hate speech labelling based on international law
- Build a web platform and related software tools to collect a dataset of hate speech suitable for NLP algorithm training
- Push the envelope of Natural Language Processing in the Myanmar Languages
We are looking for an experienced data engineer to lead the development of the software tools that are crucial to bring this project to success.
Besides working on the above project as the main work, you will also work with our in house data engineering team implementing our data pipeline for internal and external reporting.
Scope of Position
- Take the lead the design and implementation of databases for data storage of large datasets
- Administer the databases to ensure data confidentiality, integrity and availability
- Tune performance of the databases to ensure data can be accessed efficiently
- Develop data scraping tools to collect data from various web sources
- Develop data collection tools and in house libraries leveraging available APIs of various social media platforms
- Ensure data confidentiality, integrity and availability of the collected data
- Manage and maintain the software libraries for data scraping and data collection
- Actively participate in the project and take ownership of your own area of work
- BSc or MSc in Data Engineering, Software Engineering or similar area
- 3-4 years of experience working in data engineering projects
- Proven experience in designing databases and integrating with related data pipeline tools, such as SQL and Graph databases, data warehouses and ETL tools
- Proven experience in Python including scraping libraries such as beautiful soup
- Experience with cloud platforms, preferably Microsoft Azure or Google Cloud Platform
- Ability and experience in designing and maintaining libraries using the above tools
- Proven track record of working in challenging projects and taking responsibility in the area of work
- Experience in machine learning is a plus
- Self starter who pushes the project along and takes responsibility in his or her actions and is willing to go the extra mile