Main Responsibilities:
- Manage and optimize databases to ensure data integrity, security, and accessibility.
- Design and optimize scalable ETL pipelines.
- Develop and integrate AI solutions using frameworks such as Langchain.
- Ensure data quality in coordination with Data Scientists.
Requirements:
- Experience in managing and optimizing relational databases.
- Excellent proficiency in Python, deep knowledge of SQL, and familiarity with R (nice to have).
- Knowledge of API frameworks.
- Experience using versioning tools like Git.
- Competence with cloud technologies, particularly Google Cloud Platform (GCP).
- Familiarity with AI frameworks like Langchain and related tools and technologies.
- Interest in the open-source world and familiarity with related technologies and tools.
Nice to Have:
- Experience in creating ML/AI models, particularly LLM.
- Knowledge of principles in managing and analyzing spatial data (geolocated data).
- Experience with other cloud technologies like AWS and Azure.
- Knowledge of Kubernetes and containerization with Docker.
- Experience or interest in working with vector data for AI projects.
- Experience with Apache Superset or other open-source and commercial Business Intelligence tools.
- Knowledge of Apache Kafka, Apache NiFi, and Apache Spark for Big Data applications.