VPC Inference

In this section, we’ll upload your model to Poolside using the Splash CLI. This process involves listing existing models, storing your model in an S3 bucket, and then importing it into Poolside.

List Existing Models

First, let’s verify that no models are currently loaded by running the list command:

splash models list

Prepare Your Model

Before importing, ensure your model is stored in an S3 bucket. We recommend using a dedicated S3 bucket for storing models provided by Poolside.

Import the Models

Use the following command to import your chat model (malibu):

splash models create <MODEL_NAME> \
    --replicas <NUMBER_OF_REPLICAS> \
    --checkpoint s3://<YOUR_MODEL_BUCKET>/path/to/your/model/ \
    --capabilities <CAPABILITY_1>,<CAPABILITY_2> \
    --default \
    --public
    --gpus 4

Remember to ensure that your EKS pod role has sufficient permissions to read from your target S3 buckets.

Replace the placeholders with your specific values:

<MODEL_NAME> the unique name of your model (e.g., malibu_0703)
<NUMBER_OF_REPLICAS> the number of replicas to create (e.g., 2)
<YOU_MODEL_BUCKET> the S3 bucket where your model is stored

The following command can be used to import your completion model (point):

splash models create <MODEL_NAME> \
    --replicas <NUMBER_OF_REPLICAS> \
    --checkpoint s3://<YOUR_MODEL_BUCKET>/path/to/your/model/ \
    --default \
    --public \
    --gpus 1 \

You can view the full list of model creation options by running:

splash models create --help

Authentication

You will be prompted to authenticate through your identity provider. Open the provided URL in a web browser and log in with valid credentials. Ensure you have the necessary permissions to access the S3 bucket and import models into Poolside. If you encounter any issues, check your AWS credentials and Poolside permissions.

Verification

After the import process is complete, run the list command again to confirm that your model has been successfully added:

splash models list

Testing the API

After successfully loading the model, we can also test the API to ensure that everything is working properly. First, you need to get the bearer token for your logged in user. We can do this by checking the Splash config file for the token with a ps prefix. Run:

splash config edit

Note down the bearer token and then use either cURL or an API testing tool like Postman to hit the core-api. The cURL command can be structured as follows:

curl --location 'https://<api.domain.com>/v0/prompt' \
     --header 'Content-Type: application/json' \
     --header 'Authorization: Bearer ps-rest_of_your_token' \
     --data '{
       "prompt": "briefly explain cURL",
       "intent": "chat",
       "context": {
         "elements": []
       }
     }'

Note: Remember to insert your own token as the value for the auth header. Detailed documentation on the Poolside API can be found by navigating to http://<api.domain.com>/docs

Troubleshooting

If you encounter errors with model creation, you can view the logs by first locating your model pod name:

kubectl get pods -n poolside-models

You can then print the logs for the pod in question:

kubectl logs -n poolside-models podname

Overview

Cloud deployment

On-premises deployment

Configuration

Metrics and telemetry

List Existing Models

Prepare Your Model

Import the Models

Authentication

Verification

Testing the API

Troubleshooting

Overview

Cloud deployment

On-premises deployment

Configuration

Metrics and telemetry

​List Existing Models

​Prepare Your Model

​Import the Models

​Authentication

​Verification

​Testing the API

​Troubleshooting

List Existing Models

Prepare Your Model

Import the Models

Authentication

Verification

Testing the API

Troubleshooting