Fill your skill gaps in AI and Data Science

Tag: python

Maximizing Efficiency and Cost Control in PySpark

Post author By Jared Rand
Post date June 4, 2023
No Comments on Maximizing Efficiency and Cost Control in PySpark

Photo by Lukas on Pexels.com

PySpark, a Python library integrated with Apache Spark, has revolutionized big data analytics with its speed, scalability, and efficiency. It offers a wide array of data transformation, analysis, and machine learning capabilities, making it a go-to tool for handling large datasets and real-time data streaming. However, as a data scientist, it is crucial to balance […]

Tags big data, data science, parallel, parallel processing, PySpark, python, spark

DAGs Data Engineering

How to Improve a DAG

This post is part of a collaboration between Alisa Aylward of Alisa in Techland and Jared Rand of Skillenai. View Jared’s post on Alisa in Techland here. Discover the worst data pipeline ever and how to improve it’s DAG. Learn to remove cycles and handle dependencies efficiently. What is a DAG? DAG stands for directed […]

Tags airflow, apache airflow, cycles, DAG, data engineering, data pipeline, directed acyclic graph, graphs, pipeline, python

Coding Data Science Python

Singleton Fails with Multiprocessing in Python

Post author By Jared Rand
Post date December 5, 2020
No Comments on Singleton Fails with Multiprocessing in Python

Photo by Andreas Wagner on Unsplash

A singleton is a class designed to only permit a single instance. They have a bad reputation, but do have (limited) valid uses. Singletons present lots of headaches, and may throw errors when used with multiprocessing in Python. This article will explain why, and what you can do to work around it. A Singleton in […]