Skip to content

DevOps

Mastering Spark on K8s 🔥 and Why I Dumped 💔 Kubeflow Spark Operator (Formerly Google's Spark Operator)!

Heyoooo Spark ⚡ developers! My product manager several months ago asked me one question: "Is it possible to run Spark applications without K8s 🐳 cluster-level access?" At the time, I only knew the Kubeflow 🔧 Spark Operator well and was using it for deploying all my Spark applications. For those who know, you must have K8s cluster-level access to use the Kubeflow Spark Operator. The reasons are because it installs CRDs and ClusterRole. So I told him "no" with these reasons, and on his side, he tried his best to convince the prospect with the constraint in mind. At the enterprise level, they usually have a multi-tenant K8s cluster segregated by company/department, project, and environment (dev, uat, pre-prod, or prod) using Namespaces. This way, they make the most of the computing resources allocated. Plus, if one project does not meet the expectation or the contract ends, hop hop kubectl delete <compordept>-<project>-<env> and it's like the project has never existed. I am currently writing to tell my product manager, "Yes, it's possible to run Spark applications without K8s cluster-level access."! Here is how! 🚀

Introducing TARP Stack ⛺ – Tapir, React and PostgreSQL

I landed my first job as a Data Engineer using Scala. It's been over 3 years now, approaching 4 years. The more experience you gain, the more you want to spread your wings 🪽 to tackle even bigger and more complex projects than just data pipelines, like developing full-stack web data applications. But, I really do not want to dissipate myself too much on all the programming languages, libraries, or frameworks out there 😣. These are just tools. What's important is how efficiently you can use them for the product or feature you envisioned 🦄🌈. Sooo! For me, it's currently the TARP tech stack!