Big Data | The Big Data Company

Distributed Processing

We have implemented data projects for different clients across multiple continents.

PySpark

Context:

A client needed to process large volumes of CRM contact data for duplicate account detection and database cleansing, as a feature of their SaaS product.

Challenge:

The original Python code ran on a single instance, which was far too slow to meet the project’s time requirements.

Solution:

We refactored the processes to PySpark, leveraging the power of distributed processing. The deployment was carried out on Google DataProc, orchestrated with Apache Airflow.

Results:

Processing speed improved 3x faster compared to the legacy code.

Conclusion:

The client now has a robust and scalable system, capable of growing alongside their expanding user base.

a person holding a smart phone with a credit card on top of it

Our Expertise

We work with clients across various industries, including startups such as FinTechs, LogTechs, as well as organizations in both the public and private sectors. Our focus is on long-term projects and providing continuous specialized support.

cargo containers are stacked on top of each other

The Big Data Company

Solutions in Data Engineering and Data Science.

CONTACT

CONTACT US

robson@thebigdatacompany.com.br

+55 47 99607 5445