Kubernetes pod readiness probe failed

I have a kubernetes cluster running with minikube in my local. I am trying to train a model with KubeFlow’s TFJob operator which implements TensorFlow’s distributed training strategies. The YAML for the TFJob looks like this apiVersion: kind: TFJob metadata: name: tf-job-cifar namespace: kubeflow spec: tfReplicaSpecs: Worker: replicas: 2 restartPolicy: Never template: metadata: annotations: […]

microk8s not running after installation

I want to install kubeflow using microk8s on kubernetes cluster, but I faced a problem with microk8s. I already install microk8s using this link. So, when I tried to see the status on microk8s, it was said not running microk8s is not running. Use microk8s inspect for a deeper inspection. When I try to inspect […]

Kubeflow: Can’t import from modules in docker image

I’ve set up a kubeflow pipeline where I have an image and a function importing a module within the image: import kfp from kfp.components import func_to_container_op, InputPath, OutputPath def process_data(experiment_name): import os print("Listing dir: ", os.system("ls")) import driver # decorator with an arg process_data = func_to_container_op(process_data, base_image=image) @kfp.dsl.pipeline(name="", description="") def pipeline(experiment_name): process_data(experiment_name) if __name__ == […]

Use run parameter as arg. kubeflow

I am trying to use a kubeflow run parameter as an argument for my pipeline step. Every time I compile the yaml file however it gets changed from an Integer to a LocalPath. @dsl.pipeline(name=’First Pipeline’, description=’generates a random set of numbers then performs operations on them returning a json object’) def first_pipeline(generate_n_arg: int = 10): […]

Does kubernetes support setting up cluster with on-premise GPU machines

we’re buying GPU machines nowadays and going to use them for running ML training. Planned system architecture which consist of {producer, queue and N*workers} is that Having producer(kind of master) which enqueue training request to queue. N workers which monitor the queue so that it pulls requests if any and then running training. My questions […]

Getting inference from deployed model only working from within docker container

I have deployed a model in a pod/docker container for detecing flowers by use of Kubeflow. From the host server, its not possible to get inferences from the model: curl -v -H "Host:" http://localhost:8080/v1/models/test1:predict -d @./tf_flowers_input.json * About to connect() to localhost port 8080 (#0) * Trying ::1… * Connected to localhost (::1) port […]

Multi-User – Multi-Models deployment in Kubernetes?

I have multiple(n) deep learning models built for multiple(n) users. Each user have his own model. How do I handle this scenario in production where requests are coming from multiple users. I need to serve each user with his own model. There should not be any latency of loading the models. Should the models always […]

Kubeflow Kale specify container image in pipeline step

Kale allows user to specify only step and dependency from the UI. However, I would like to also specify the docker image to use for the step. I can’t figure how to specify a custom docker image to use in the pipeline step from Kale UI. Any suggestions on how to implement this? Source: Docker […]

Setting up container registry for kubeflow

I am using ubuntu 20.04 while following book -> on page 17, it says the following (only relevant parts) which I don’t understand…. You will want to store container images called a container registry. The container registry will be accessed by your Kubeflow cluster. I am going to use docker hub as container registry. […]

Installing Kubernetes in Docker container

I want to use Kubeflow to check it out and see if it fits my projects. I want to deploy it locally as a development server so I can check it out, but I have Windows on my computer and Kubeflow only works on Linux. I’m not allowed to dual boot this computer, I could […]

