Skip to content

Kabyle Phonotactic Analyzer: computes CV patterns from a text corpus.

License

Notifications You must be signed in to change notification settings

BoFFire/Kabyle-Phonotactic-Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kabyle Phonotactic Analyzer

Kabyle Phonotactic Analyzer: computes CV patterns from a text corpus.

Usage

git clone https://github.com/BoFFire/Kabyle-Phonotactic-Analyzer.git

Installation :

cd Kabyle-Phonotactic-Analyzer
pip install .

How to use it ?

kab-phonolyzer <file_path> [options]
kab-phonolyzer kab.txt -n 20 --batch-size 1000
kab-phonolyzer kab.txt --output results.csv --n 20

For large corpus :

kab-phonolyzer large_corpus.txt --batch-size 5000 --output analysis.csv

Example :

$ kab-phonolyzer kab.txt --n 3
Analysis of file:

CVC: 68339 (e.g., 'seg')
CVCCVC: 66497 (e.g., 'werǧin')
CVCCV: 37473 (e.g., 'dinna')

Pragmatic usage :

You can use

from kab_phonolyzer.phono import process_corpus, analyze_corpus

Process a Corpus :

from kab_phonolyzer.phono import process_corpus

# Process a file and get raw data
fdist, examples = process_corpus("kab.txt", batch_size=1000)

# Access frequency distribution and examples
print(fdist.most_common(5))  # Top 5 patterns
print(examples["CVC"])  # Example words for pattern "CVC"

Analyze and save results :

from kab_phonolyzer.phono import analyze_corpus

# Analyze and save to CSV
analyze_corpus("kab.txt", top_n=10, batch_size=500, output="results.csv")

Pragmatic usage :

from kab_phonolyzer.phono import process_corpus

# Process a corpus and save results
fdist, examples = process_corpus("kab.txt", batch_size=1000)

# Save to CSV manually
with open("custom_results.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["Pattern", "Frequency", "Example"])
    for pattern, freq in fdist.most_common(20):
        ex = random.choice(examples.get(pattern, ["No example"]))
        writer.writerow([pattern, freq, ex])

Convert a Single Word to CV Pattern :

from kab_phonolyzer.phono import word_to_cv

word = "aselkim"
cv_pattern = word_to_cv(word)
print(f"'{word}' → {cv_pattern}")  # Output: 'vcvccvc'

About

Kabyle Phonotactic Analyzer: computes CV patterns from a text corpus.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages