Spark in cluster and local mode

The first webinar on the parquet file format and the DuckDB analytics tool took place on February 4 and attracted 305 participants.

For those who were unable to attend, we’ve attached the link to the replay and the supporting documents.

The next webinar will cover spark in local and cluster mode, and will take place on April 30 from 11:00 to 12:30.
We’ll send out the registration link in our next newsletter.

As an example of a Spark application, CASD has set up a SPARK/HDFS cluster as part of a MIDAS project for DARES, enabling calculations to be distributed as close as possible to the data spread over 15 servers, bringing together :
– 150 vCPUs,
– 2.8 TB RAM,
– 30 TB of raw disk space