Feature request - Add all v1/ routes #47

visitsb · 2024-06-17T04:59:06Z

@npuichigo I am trying to use Triton Inference Server with TensorRT-LLM backend with openweb-ui as frontend, but not all routes are provided, e.g. /v1/models etc.

Is there any plan to support all openapi v1 routes?

It will be really great if full openai api support is available, since kserve is still under works.

The text was updated successfully, but these errors were encountered:

npuichigo · 2024-06-17T05:41:44Z

@visitsb It's fine to add /v1/models. But the list of full openai api is long, like /v1/audio, /v1/embedding. What's the minimal subset is needed?

visitsb · 2024-06-17T07:00:25Z

@npuichigo Thanks for the quick reply!

Are you able to add below? Looking at open-webui's implementation, at minimum-

/v1//models
/v1//chat/completions
/v1/embeddings
/v1/audio/speech
/v1//audio/transcriptions

Wish there was an easier way to provide full compatibility, but sometime in future.

npuichigo · 2024-06-17T07:17:12Z

The exposed API depends on the actual model hosted in triton backend. Since there's no embedding model available in trtllm, /v1/embeddings is not possible. For embedding model, maybe you can refer to https://github.com/huggingface/text-embeddings-inference.

The same reason applies to /v1/audio/* since no ASR and TTS models are available now in trtllm.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request - Add all v1/ routes #47

Feature request - Add all v1/ routes #47

visitsb commented Jun 17, 2024

npuichigo commented Jun 17, 2024

visitsb commented Jun 17, 2024

npuichigo commented Jun 17, 2024

Feature request - Add all v1/ routes #47

Feature request - Add all v1/ routes #47

Comments

visitsb commented Jun 17, 2024

npuichigo commented Jun 17, 2024

visitsb commented Jun 17, 2024

npuichigo commented Jun 17, 2024