Zesta DocAssist-LLM

RAG-enhanced LLaMa-like transformer neural network based LLM for assistance on programming languages/frameworks' documentations, with a streamlit front-end.

Based on the paper: "DocAssist: Large Language Models’ Brilliance in the Art of Web Development Dialogue"

Abstract

Accessing comprehensive and relevant documentation is crucial for developers. This project proposes a novel approach leveraging a RAG-enhanced LLaMa-like Transformer Neural Network (RAG-LLTN) to assist developers in accessing and understanding programming language and framework documentation.

The RAG-LLTN model integrates Retrieval-Augmented Generation (RAG) capabilities with the architecture of LLaMa-like Transformer Neural Networks (LLTN), enhancing its ability to retrieve and generate contextually relevant and accurate documentation assistance. RAG allows the model to retrieve relevant passages from a knowledge base, while LLTN enables efficient processing and generation of natural language text. The integration with a Streamlit front-end offers a practical solution for developers.

Architecture

The core of DocAssist is based on a LLaMa-like transformer architecture enhanced with Retrieval-Augmented Generation (RAG).

Key Enhancements (LLaMa-like Architecture)

Pre-normalization Using RMSNorm: Employs RMSNorm for normalizing the input of each transformer sub-layer, optimizing computational cost compared to standard Layer Normalization while maintaining similar performance.
SwiGLU Activation Function: Uses the SwiGLU activation function (inspired by PaLM and extending Swish) in the feed-forward network, offering flexibility by combining smoothness and piecewise linearity.
Rotary Embeddings (RoPE): Encodes absolute positional information using a rotation matrix, naturally incorporating explicit relative position dependency in self-attention. Offers scalability and decaying inter-token dependency with distance.

Retrieval-Augmented Generation (RAG)

RAG integrates retrieval-based methods with generative models. It first retrieves relevant passages from a knowledge base (e.g., documentation) based on the input query and then uses a generative model to synthesize this information into a coherent response.

Frontend

A user-friendly interface is provided using Streamlit (Python).

Figure 1. Chat Window

References

Touvron H. Lavril T., Izacard G., et al. LLaMA: Open and Efficient Foundation Language Models.
Vaswani A., Shazeer N., Parmar N., et al. Attention Is All You Need.
Shazeer N. GLU Variants Improve Transformer.
Zhang B., Sennrich R. Root Mean Square Layer Normalization.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
RAG-model		RAG-model
assets		assets
dataset		dataset
docs		docs
qdrant_storage		qdrant_storage
transformer(llama)-model		transformer(llama)-model
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.adoc		README.adoc
flake.lock		flake.lock
flake.nix		flake.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zesta DocAssist-LLM

Abstract

Architecture

Key Enhancements (LLaMa-like Architecture)

Retrieval-Augmented Generation (RAG)

Frontend

References

About

Languages

License

DivitMittal/DocAssist-LLM

Folders and files

Latest commit

History

Repository files navigation

Zesta DocAssist-LLM

Abstract

Architecture

Key Enhancements (LLaMa-like Architecture)

Retrieval-Augmented Generation (RAG)

Frontend

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages