Distributed Processing
We have implemented data projects for different clients across multiple continents.
PySpark
Context:
A client needed to process large volumes of CRM contact data for duplicate account detection and database cleansing, as a feature of their SaaS product.
Challenge:
The original Python code ran on a single instance, which was far too slow to meet the project’s time requirements.
Solution:
We refactored the processes to PySpark, leveraging the power of distributed processing. The deployment was carried out on Google DataProc, orchestrated with Apache Airflow.
Results:
Processing speed improved 3x faster compared to the legacy code.
Conclusion:
The client now has a robust and scalable system, capable of growing alongside their expanding user base.
Our Expertise
We work with clients across various industries, including startups such as FinTechs, LogTechs, as well as organizations in both the public and private sectors. Our focus is on long-term projects and providing continuous specialized support.







