Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request - Add all v1/ routes #47

Open
visitsb opened this issue Jun 17, 2024 · 3 comments
Open

Feature request - Add all v1/ routes #47

visitsb opened this issue Jun 17, 2024 · 3 comments

Comments

@visitsb
Copy link

visitsb commented Jun 17, 2024

@npuichigo I am trying to use Triton Inference Server with TensorRT-LLM backend with openweb-ui as frontend, but not all routes are provided, e.g. /v1/models etc.

Is there any plan to support all openapi v1 routes?

It will be really great if full openai api support is available, since kserve is still under works.

@npuichigo
Copy link
Owner

@visitsb It's fine to add /v1/models. But the list of full openai api is long, like /v1/audio, /v1/embedding. What's the minimal subset is needed?

@visitsb
Copy link
Author

visitsb commented Jun 17, 2024

@npuichigo Thanks for the quick reply!

Are you able to add below? Looking at open-webui's implementation, at minimum-

/v1//models
/v1//chat/completions
/v1/embeddings
/v1/audio/speech
/v1//audio/transcriptions

Wish there was an easier way to provide full compatibility, but sometime in future.

@npuichigo
Copy link
Owner

The exposed API depends on the actual model hosted in triton backend. Since there's no embedding model available in trtllm, /v1/embeddings is not possible. For embedding model, maybe you can refer to https://github.com/huggingface/text-embeddings-inference.

The same reason applies to /v1/audio/* since no ASR and TTS models are available now in trtllm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants