SchemaForge AI 🚀

Simplify AI structured data processing, eliminate repetitive prompt engineering

SchemaForge AI is a service based on FastAPI and Pydantic designed to solve the problem of developers repeatedly writing code and prompts to have AI convert text into structured data. By providing a unified API interface, developers only need to define the desired data structure without writing specialized prompts and processing logic each time. Supports multiple AI providers including OpenAI, Anthropic, Google, and more.

中文文档

✨ Core Features

Multi-model Support - Seamlessly integrate with the latest models from OpenAI, Anthropic, Google, Mistral, Cohere, and Groq
Dynamic Schema Definition - Create custom data structures using Pydantic models
Unified Interface - Eliminate the need to write specialized prompts and processing code for each structuring task
RESTful API - Easy-to-use API interface with model selection and parameter configuration
Built-in Security - API key authentication and comprehensive error handling
Model Comparison - Compare different AI models' performance on the same structuring tasks
Docker Support - Easy deployment to any environment

🌟 Why Choose SchemaForge AI?

In real-world development, we often need to use AI to convert text into structured data, which typically requires manually writing prompts and processing code for each use case. This repetitive work is both time-consuming and tedious. SchemaForge AI provides a unified solution where you only need to define the target data structure, and the system automatically handles prompt generation and data validation, allowing developers to focus on business logic rather than repetitive coding.

Practical Use Cases:

📄 Document Parsing - Extract key data from contracts, resumes, or forms
🌐 API Response Conversion - Standardize third-party API responses to your application's required format
📊 Data Normalization - Unify data formats from different sources
💬 AI Response Structuring - Ensure AI responses conform to predefined data models
📝 Content Analysis - Extract structured data from articles or social media

🚀 Quick Start

Installation

Clone the repository

git clone https://github.com/X-Zero-L/schemaforge-ai.git
cd schemaforge-ai

Install dependencies with UV

# Install UV if you haven't already
curl -fsSL https://astral.sh/uv/install.sh | sh

# Install dependencies
uv sync

Configure environment variables

cp .env.example .env
# Edit the .env file to add your API keys

Running the Service

uvicorn app.main:app --reload

The service will run at http://localhost:8000

Using Docker

We use UV for dependency management in our Docker setup for faster and more reliable builds.

docker-compose up -d

💻 Usage Examples

The examples directory contains comprehensive examples showing how to use SchemaForge AI:

1. Predefined Model Usage

from pydantic import BaseModel, Field
import json
import httpx

# Define your data model
class Person(BaseModel):
    name: str = Field(..., description="Person's full name")
    age: int = Field(..., description="Age in years")
    height: float = Field(..., description="Height in centimeters")
    occupation: str = Field(None, description="Current occupation")

# Send text for structuring
async def structure_data(content, model, api_key):
    schema_json = model.model_json_schema()
    
    response = await httpx.AsyncClient().post(
        "http://localhost:8000/api/v1/structure",
        json={
            "content": content,
            "schema_description": json.dumps(schema_json),
            "model_name": "openai:gpt-4o"
        },
        headers={"Authorization": "Bearer your_api_key"}
    )
    
    return response.json()

# Example result:
# {
#   "success": true,
#   "data": {
#     "name": "John Smith",
#     "age": 32,
#     "height": 182.5,
#     "occupation": "software engineer"
#   },
#   "model_used": "openai:gpt-4o"
# }

2. Model Generation

async def generate_model(sample_data, model_name, description, api_key):
    response = await httpx.AsyncClient().post(
        "http://localhost:8000/api/v1/generate-model",
        json={
            "sample_data": sample_data,
            "model_name": model_name,
            "description": description,
            "llm_model_name": "openai:gpt-4o"
        },
        headers={"Authorization": "Bearer your_api_key"}
    )
    
    return response.json()

# Response includes:
# - model_code: Generated Pydantic model code (for Python)
# - json_schema: JSON Schema representation (for any programming language)
# - fields: Structured field definitions

The API returns both Python Pydantic code and a JSON Schema representation, allowing you to:

Use the Pydantic model directly in Python applications
Use the JSON Schema to generate models in any other language
Build validations in JavaScript, Java, C#, Go, or any other language that supports JSON Schema

Check out the examples directory for more detailed examples including:

Structuring different types of content (person info, books, news articles)
Comparing different AI models on the same task
Generating models from JSON, text, and CSV data
Working with nested data structures
Adding validation rules

🔍 API Documentation

Visit http://localhost:8000/docs to view the complete API documentation.

Main Endpoints

Endpoint	Description
`/api/v1/structure`	Structure text data using a provided schema
`/api/v1/generate-model`	Generate a Pydantic model from sample data

🧠 Supported AI Models

SchemaForge AI is designed with flexibility in mind. You can use models from any supported provider by specifying them in the format provider:model_name:

OpenAI: Any model from their lineup including gpt-3.5-turbo, gpt-4, gpt-4o, and future models as they become available
Anthropic: Any Claude model including the Claude 3 family (Opus, Sonnet, Haiku) and future releases
Google: Gemini models including gemini-1.5-pro, gemini-1.5-flash, and newer versions
Mistral: Any Mistral AI models including mistral-large, mistral-small, and their latest versions
Cohere: Command models and any new Cohere releases
Groq: LLaMA and other models available through Groq's fast inference platform

The service doesn't restrict you to specific model versions - as providers release new models, you can immediately use them by specifying them in your requests without waiting for updates to this service.

Specify any model using the format: provider:model_name (e.g., openai:gpt-4o or anthropic:claude-3-sonnet-20240229)

SDK for Easy Integration

For easier integration with your applications, we provide an official Python SDK:

📦 SchemaForge SDK

Seamless integration for Python applications

The SDK provides a clean, Pythonic interface to SchemaForge AI:

from pydantic import BaseModel
from schemaforge import SchemaForge

# Initialize client
client = SchemaForge(api_key="your_secure_api_key_here")

# Define a Pydantic model
class Person(BaseModel):
    name: str
    age: int
    occupation: str
    email: str

# Structure text using the model
person = client.structure(
    content="John is a 30-year-old software engineer with email john@example.com",
    model_class=Person
)
print(person.model_dump())
# {'name': 'John', 'age': 30, 'occupation': 'software engineer', 'email': 'john@example.com'}

Visit the SDK repository for installation instructions, documentation, and examples.

🛠️ Advanced Configuration

See the configuration documentation for more information about customization options.

🔮 Future Plans

We're continuously working to improve SchemaForge AI. Here are some of the features we plan to implement:

Additional AI Providers - Expand support to include more LLM providers as they become available
Enhanced Input Processing - Support for more complex input formats including tables, PDFs, and images
Performance Optimization - Improvements to processing speed and resource utilization
Advanced Validation Rules - More sophisticated validation capabilities for generated models
Web Interface - A browser-based management console for easier configuration and testing
Output Format Extensions - Support for generating models in additional programming languages beyond Python/Pydantic
Batch Processing API - Efficiently process multiple structuring requests in a single operation

If you have suggestions for additional features, please share them in our Discussion Forum!

🌍 Multi-language Support

While our examples are primarily in Python, the SchemaForge AI API can be integrated with any programming language capable of making HTTP requests. We welcome community contributions of integration examples in other languages!

If you've implemented SchemaForge AI in your favorite language, please consider sharing your code samples. We'd love to include examples for:

JavaScript/TypeScript (Node.js, browser)
Java
Go
C#/.NET
PHP
Ruby
Rust
And more!

This helps make SchemaForge AI more accessible to developers from different backgrounds and ecosystems. Submit your examples through a pull request or share them in the discussions.

🙏 Acknowledgements

SchemaForge AI builds upon the work of several amazing open-source projects:

PydanticAI - The powerful agent framework that makes it less painful to build production-grade applications with Generative AI
Logfire - Comprehensive logging and monitoring solution that helps with debugging, performance monitoring, and behavior tracking
FastAPI - High-performance web framework for building APIs
Pydantic - Data validation and settings management using Python type annotations
And all the AI model providers whose APIs make this service possible

We're grateful to the maintainers and contributors of these projects for their excellent work.

🤝 Contributing

Contributions welcome! If you find any issues or have suggestions for improvements, please submit an issue or PR.

📣 Community

Discussion Forum

📄 License

MIT

If this project has been helpful to you, please give it a ⭐️ star!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
app		app
docs		docs
examples		examples
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
README_zh.md		README_zh.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
start.sh		start.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SchemaForge AI 🚀

✨ Core Features

🌟 Why Choose SchemaForge AI?

🚀 Quick Start

Installation

Running the Service

Using Docker

💻 Usage Examples

1. Predefined Model Usage

2. Model Generation

🔍 API Documentation

Main Endpoints

🧠 Supported AI Models

SDK for Easy Integration

📦 SchemaForge SDK

🛠️ Advanced Configuration

🔮 Future Plans

🌍 Multi-language Support

🙏 Acknowledgements

🤝 Contributing

📣 Community

📄 License

About

Releases

Packages

Languages

X-Zero-L/schemaforge-ai

Folders and files

Latest commit

History

Repository files navigation

SchemaForge AI 🚀

✨ Core Features

🌟 Why Choose SchemaForge AI?

🚀 Quick Start

Installation

Running the Service

Using Docker

💻 Usage Examples

1. Predefined Model Usage

2. Model Generation

🔍 API Documentation

Main Endpoints

🧠 Supported AI Models

SDK for Easy Integration

📦 SchemaForge SDK

🛠️ Advanced Configuration

🔮 Future Plans

🌍 Multi-language Support

🙏 Acknowledgements

🤝 Contributing

📣 Community

📄 License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages