TinyTalker

A small GPT model based on a character level tokenization system. It can train on any dataset and learns patterns really well. Adjust as needed.

How To Use:

To Train/Fine-Tune:

Add an input.txt file with any text you want it to learn from. (<eos> will be converted into an end of statement token)
Run train.py to train it. (Inside it you can change batch_size, max_iters, learning_rate to march your needs)
NOTE: If you have a model.pth file already, it will be loaded from at the start of training and overwritten at the end of training.

To prompt: Just run prompt.py.

Real Example Conversation

NOTE: This is an output from model.pth which was trained on a cleaned simple wikipedia custom dataset, and a Q&A/Basic Conversation Dataset custom dataset for 30,000 max_iters each, then fine-tuned on a few geography trivia questions. The training was done on google collab with the fine-tuning being done on my MacBook.

Prompt: Hello!
Output: Hey!
Prompt: What's your name?
Output: TinyTalker.
Prompt: What can you do?
Output: I can generate text!
Prompt: Nice!
Output: Cool!
Prompt: What is the capital of Japan?
Output: Tokyo is the capital of Japan.
Prompt: What are some other cities there?
Output: Sure! Here is a list of Japan: Tokyo(Japan's Capital), Osaka, Kyoto, Sapporo, Sendai, Kobe, Fukuoka, Nagoya, and Nara are some Japanese cities.
Prompt: Can you give me even more Japanese Cities?
Output: Shi Hiroshima and Nagasaki famous for being the only 2 cities to be Nuked during World War 2.
Prompt: Wow... thats grim.
Output: What cities were nuked during World War 2/WWWWWW2?
Prompt: You just told me.
Output: TinyTalker.
Prompt: Ah okay, this is roughly the limit of what you can do.
Output: Nagasaki as the gateway to the Amazon.
Prompt: yep.
Output: Manaus is situated in which country?
Prompt: Brazil, anyways Bye! Have a nice day!
Output: Bye!

Resources

Credits

This model is inspired by Andrej Karpathy Let's build GPT from scratch video aswell as Andrej Kaparthy nanoGPT with modifications for this project.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
chars.txt		chars.txt
model.pth		model.pth
model.py		model.py
prompt.py		prompt.py
server.py		server.py
tokenizer.py		tokenizer.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TinyTalker

How To Use:

Real Example Conversation

Resources

Credits

About

Releases

Packages

Languages

License

rexmhall09/TinyTalker

Folders and files

Latest commit

History

Repository files navigation

TinyTalker

How To Use:

Real Example Conversation

Resources

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages