Distributed Processing

We have implemented data projects for different clients across multiple continents.

PySpark

Context:

A client needed to process large volumes of CRM contact data for duplicate account detection and database cleansing, as a feature of their SaaS product.

Challenge:

The original Python code ran on a single instance, which was far too slow to meet the project’s time requirements.

Solution:

We refactored the processes to PySpark, leveraging the power of distributed processing. The deployment was carried out on Google DataProc, orchestrated with Apache Airflow.

Results:

Processing speed improved 3x faster compared to the legacy code.

Conclusion:

The client now has a robust and scalable system, capable of growing alongside their expanding user base.

a person holding a smart phone with a credit card on top of it
a person holding a smart phone with a credit card on top of it

Our Expertise

We work with clients across various industries, including startups such as FinTechs, LogTechs, as well as organizations in both the public and private sectors. Our focus is on long-term projects and providing continuous specialized support.

cargo containers are stacked on top of each other
cargo containers are stacked on top of each other
laptop computer on glass-top table
laptop computer on glass-top table
factories with smoke under cloudy sky
factories with smoke under cloudy sky
citiscan result hand ok
citiscan result hand ok
shallow focus photography of quadcopter
shallow focus photography of quadcopter