Post-initial commit

main
Eric Ihli 4 years ago
parent 5ce14fefe2
commit b6a89a774f

@ -1,3 +1,218 @@
======
TODO
======
- Allow a depth with thesaurus lookups.
- Allow restriction to rhymes with certain number of syllables.
- Word graph with weights to form most likely sentences.
=============
Terminology
=============
Use case:
---------
I want to find phrase B that rhymes with phrase A where phrase B has a
specifiable sentiment.
Something like:
"please turn on your magic beam"
"queeze churn horrific bloodstream"
I want these settings to be optional:
- phrase B to conform to certain grammatical structure.
- config of which words I prefer rhymes in phrase be
- config for rhyming rimes, onsets, and/or nuclei
- preferred number of syllables
Some of those settings make more sense with individual words than with phrases.
Here's a tricky consideration. Let's break those phrases down into syllables.
"please turn on your magic beam"
"queeze churn horrific bloodstream"
("P" "L" "IY" "Z") ("T" "ER" "N") ("AO" "N") ("Y" "AO" "R") ("M" "AE" "JH" "IH" "K") ("B" "IY" "M")
("K" "W" "IY" "Z") ("CH" "ER" "N") ("H" "AO" "R" "IH" "F" "EU" "K") ("B" "L" "AH" "D" "S" "T" "R" "IY" "M")
We are imagining turning
("AO" "N") ("Y" "AO" "R") ("M" "AE" "JH")
into
("H" "AO" "R" "IH" "F" "IH" "K")
There's this difficulty in deciding how to group the syllables into words.
("P" "L" "IY" "Z" "T" "ER" "N" "AO" "N" "Y" "AO" "R" "M" "AE" "JH" "IH" "K" "B" "IY" "M")
("K" "W" "IY" "Z" "CH" "ER" "N" "H" "AO" "R" "IH" "F" "IH" "K" "B" "L" "AH" "D" "S" "T" "R" "IY" "M")
If you take just the raw syllables and ignore the words, you get
("P" "L" "IY" "Z") ("T" "ER" "N") ("AO" "N") ("Y" "AO" "R") ("M" "AE") ("JH" "IH" "K") ("B" "IY" "M")
and
("K" "W" "IY" "Z") ("CH" "ER" "N") ("H" "AO" "R") ("IH" "F") ("IH" "K") ("B" "L" "AH" "D") ("S" "T" "R" "IY" "M")
If you leave the syllables grouped into words, then MAGIC rhymes with
TRAGIC, PELAGIC, etc... If you ignore the groupings of syllables into
words, then you're stuck trying to rhyme with the single syllable
"word" ("M" "AE"), which doesn't have any rhymes.
Implementation
--------------
What would it look like to solve the problem for a single grouping of syllables into words?
In the case of
("P" "L" "IY" "Z") ("T" "ER" "N") ("AO" "N") ("Y" "AO" "R") ("M" "AE" "JH" "IH" "K") ("B" "IY" "M")
We wouldn't get
("K" "W" "IY" "Z") ("CH" "ER" "N") ("H" "AO" "R" "IH" "F" "EU" "K") ("B" "L" "AH" "D" "S" "T" "R" "IY" "M")
We might get the following, since the syllable groupings align.
"queeze churn don more tragic fiends"
We don't want to always restrict it to matching syllable groups,
especially not for single words. If we give it the word "nation" we
almost surely want words like "approbation" and "creation"; speaking
from the use case of trying to find a rhyme to the last word of a
phrase.
Back to the entire phrase - the idea is we solve the problem for a
single grouping of syllables and then we use the "partitions" function
to get every possible combination of grouping of syllables and apply
the solution to each of those.
Performance
+++++++++++
Let's say we want to see all possible rhyming phrases of
("P" "L" "IY" "Z") ("T" "ER" "N") ("AO" "N") ("Y" "AO" "R") ("M" "AE" "JH" "IH" "K") ("B" "IY" "M")
Let's assume each syllable grouping has an average of 10 rhyming words.
That's 10^6 possible phrases. We need a way to limit our search for
both computational reasons and for UX reasons.
There's going to be a lot of redundancy there.
We don't need each of:
"please churn don poor tragic fiend"
"squeeze churn don poor tragic fiend"
"freeze churn don poor tragic fiend"
"sneeze churn don poor tragic fiend"
\...
So maybe we cycle through each word list.
please churn don poor tragic fiend
squeeze burn con door plagic mean
freeze turn fawn store tragic stream
sneeze yearn don more plagic steam
peas churn con four tragic deem
queeze burn fawn yore plagic reem
A seperate process can search through these phrases and rank them by grammatical structure, sentiment, etc...
We could also pre-filter the possible words by sentiment.
Or, we could assign grammatical restrictions to each word and
pre-filter the words by grammar. Then we'd get something like
(adjective noun verb adjective noun) and it would really reduce the
search space.
But would we do that by hand? That might work for an individual
grouping of syllables, but how would we restrict to that for each
possible combination of grouping of syllables?
One possible solution would be to have a list of all valid grammar structures for a certain number of words.
"please churn don poor tragic fiends"
(adj noun adv verb adj noun)
(adj adj noun verb adj noun)
(noun verb noun conj noun verb)
\...
Output
++++++
::
(["TEASE" "STERN" "CON" "SCORE" "MANIC" "STEAM"]
["SQUEEZE" "BURN" "WAN" "OR" "BEATNIK" "TEAM"]
["WHEEZE" "CHURN" "ON" "WHORE" "FABRIC" "SCREAM"]
["SNEEZE" "TURN" "CON" "GORE" "FRANTIC" "SCHEME"]
["FREEZE" "EARN" "WAN" "CORE" "EPIC" "STREAM"]
["EASE" "STERN" "ON" "FLOOR" "CRYPTIC" "SEAM"]
["SEIZE" "BURN" "CON" "BORE" "TOPIC" "THEME"]
["TEASE" "CHURN" "WAN" "SNORE" "TOXIC" "DREAM"]
["SQUEEZE" "TURN" "ON" "STORE" "TONIC" "STEAM"]
["WHEEZE" "EARN" "CON" "SORE" "MYSTIC" "TEAM"]
["SNEEZE" "STERN" "WAN" "ROAR" "STATIC" "SCREAM"]
["FREEZE" "BURN" "ON" "FOR" "CLASSIC" "SCHEME"]
["EASE" "CHURN" "CON" "CORPS" "SEPTIC" "STREAM"]
["SEIZE" "TURN" "WAN" "BOAR" "CRITIC" "SEAM"]
["TEASE" "EARN" "ON" "POUR" "CHRONIC" "THEME"]
["SQUEEZE" "STERN" "CON" "SCORE" "LIPSTICK" "DREAM"]
["WHEEZE" "BURN" "WAN" "OR" "PANIC" "STEAM"]
["SNEEZE" "CHURN" "ON" "WHORE" "SEISMIC" "TEAM"]
["FREEZE" "TURN" "CON" "GORE" "FROLIC" "SCREAM"]
["EASE" "EARN" "WAN" "CORE" "GOTHIC" "SCHEME"]
["SEIZE" "STERN" "ON" "FLOOR" "TRAGIC" "STREAM"]
["TEASE" "BURN" "CON" "BORE" "CATHOLIC" "SEAM"]
["SQUEEZE" "CHURN" "WAN" "SNORE" "CYNIC" "THEME"]
["WHEEZE" "TURN" "ON" "STORE" "COMIC" "DREAM"]
["SNEEZE" "EARN" "CON" "SORE" "PSYCHIC" "STEAM"]
["FREEZE" "STERN" "WAN" "ROAR" "RELIC" "TEAM"]
["EASE" "BURN" "ON" "FOR" "COSMIC" "SCREAM"]
["SEIZE" "CHURN" "CON" "CORPS" "DRASTIC" "SCHEME"])
::
TEASE STERN CON SCORE MANIC STEAM
BREEZE BURN WAN OR FABRIC GLEAM
SQUEEZE CHURN ON WHORE FRANTIC TEAM
WHEEZE TURN GORE EPIC SCREAM
SNEEZE CORE CRYPTIC SCHEME
FREEZE FLOOR PUBLIC STREAM
EASE BORE TOPIC BEAM
SEIZE SNORE TOXIC SEAM
STORE TONIC THEME
SORE MYSTIC DREAM
ROAR STATIC
WAR SIDEKICK
FOR SEPTIC
CORPS BROOMSTICK
DRAWER CHRONIC
POUR LIPSTICK
PANIC
SEISMIC
LOGIC
FROLIC
TRAGIC
ATTIC
CYNIC
RELIC
COSMIC
DRASTIC
Features
--------
Given an output like the above, a user might see a word or phrase they really
like.
"FRANTIC SCREAM" for example.
The rest of the sentence doesn't need to rhyme or necesarily contain words that
are synonyms.
Can we provide suggestions for the rest of the sentence? We know the number of
syllables we want and the sentiment we want.
Could we use something like a Markov chain to work backwards? Given some corpus,
what words most likely preceed "frantic scream" that also align with our
syllabic requirements?
==============
Articulation
==============
@ -81,3 +296,25 @@ Examples: "n" in "nose", "m" in "may", "ŋ" in "funk".
"ŋ" is known as the letter "eng" and the technical name of the consonant is the "voiced velar nasal"
"voiced" in the above sentence refers to whether or not your vocal chords are active. Your voice chord doesn't vibrate with voiceless consonants, like "sh" "th" "p" "f". In contrast, notice the vibration in phonemes like "m" "r" "z".
=========
Example
=========
Mister Sandman, bring me a dream
Make him the cutest that I've ever seen
Give him two lips like roses in clover
Then tell him that his lonesome nights are over
Mister Sandman, bring me a dream
Blood guts and gore, a nightmare machine
\...
Please turn on your magic beam
Mister Sandman bring me a dream
Fire burn attrocious bloodstream
Mister Sandman, bring me a dream

File diff suppressed because one or more lines are too long

@ -143,28 +143,6 @@
(defn nucleus [syllables]
(map #(list (last (first (u/take-through u/vowel %)))) syllables))
(defn single? [coll] (= 1 (count coll)))
(defn partitions
"There is a partitions in clojure.combinatorics that might be more
efficient. This was fun to write. Want to understand more ways to
write this algorithm. How to make it lazy? How to jump immediately
to a specific rank?"
([coll]
(partitions coll '()))
([coll acc]
(cond
(empty? coll) acc
(single? coll) `((~coll))
:else
(let [x (first coll)]
(reduce (fn [val el]
(cons
(cons (cons x (first el)) (rest el))
(cons (cons (list x) el) val)))
'()
(partitions (rest coll) acc))))))
(defn rhyming-word
"Simple lookup in data.
Data is a tree of syllables to words.
@ -192,13 +170,13 @@
(let [syllables (s/syllabify phones)
rhymes (remove #(some nil? %)
(map (partial rhyming-words words-by-rime)
(partitions (rimes syllables))))
(u/partitions (rimes syllables))))
onsets (remove #(some nil? %)
(map (partial rhyming-words words-by-onset-nucleus)
(partitions (onset+nucleus syllables))))
(u/partitions (onset+nucleus syllables))))
nuclei (remove #(some nil? %)
(map (partial rhyming-words words-by-nucleus)
(partitions (nucleus (reverse syllables)))))
(u/partitions (nucleus (reverse syllables)))))
popular-rhymes
(let [popular (into #{} (map string/upper-case popular))]
(remove #(some empty? %)
@ -216,11 +194,13 @@
(prhyme ["R" "OY" "AH" "L"])
(let [phones ["D" "R" "IY" "M" "S" "AE" "N" "D" "HH" "OW" "P" "S"]]
(prhyme phones))
(let [phones ["D" "R" "IY" "M" "S" "AE" "N" "D" "HH" "OW" "P" "S"]]
(s/syllabify phones))
(let [phones ["AE" "N" "D" "HH" "OW" "P" "S"]]
(prhyme phones)
(get-in words-by-nucleus (nucleus (s/syllabify phones)))
(prhyme phones)
(partitions (nucleus (s/syllabify phones)))
(u/partitions (nucleus (s/syllabify phones)))
(prhyme phones))
(let [phones ["T" "AY" "M" "T" "UW" "TH" "IH" "NG" "K"]]
(rimes (s/syllabify phones))
@ -236,10 +216,10 @@
(prhyme phones))
(prhyme ["B" "Y" "UW" "T" "IH" "F" "AH" "L" "G" "ER" "L"])
(let [r (rimes (s/syllabify ["R" "OY" "AH" "L" "W" "IH" "TH" "CH" "IY" "Z"]))]
(remove #(some nil? %) (map rhyming-words (partitions r))))
(remove #(some nil? %) (map rhyming-words (u/partitions r))))
(let [r (rimes (s/syllabify ["B" "Y" "UW" "T" "IH" "F" "AH" "L" "G" "ER" "L"]))]
(remove #(some nil? %) (map (partial rhyming-words words-by-rime) (partitions r))))
(remove #(some nil? %) (map (partial rhyming-words words-by-rime) (u/partitions r))))
(get
(->> words

@ -0,0 +1,343 @@
(ns com.owoga.prhyme.frp
(:require [clojure.java.io :as io]
[clojure.pprint :as pprint]
[clojure.string :as string]
[clojure.set :as set]
[com.owoga.prhyme.core :as p]
[com.owoga.prhyme.util :as u]
[com.owoga.prhyme.syllabify :as s]))
(def dictionary
(line-seq (io/reader (io/resource "cmudict_SPHINX_40"))))
(def thesaurus
(->> (line-seq (io/reader (io/resource "mthesaur.txt")))
(map #(string/split % #","))
(map #(vector (first %) (rest %)))
(into {})))
(defrecord Word [word syllables syllable-count rimes onsets nuclei])
(defrecord RhymeTarget [word syllables syllable-count rimes onsets nuclei partitions?])
(defrecord RhymeSubTarget [wordphrase syllables syllable-count rimes onsets nuclei
rimes? onsets? nuclei? synonyms?])
(defrecord Synonym [syllables target words])
(defn make-word [word]
(let [syllables (s/syllabify (rest word))
rimes (p/rimes syllables)
onsets (p/onset+nucleus syllables)
nuclei (p/nucleus syllables)]
(->Word
(first word)
syllables
(count syllables)
rimes
onsets
nuclei)))
(def words (->> dictionary
(map u/prepare-word)
(map make-word)))
(def popular-dict
(set (line-seq (io/reader (io/resource "popular.txt")))))
(def popular (filter #(get popular-dict (string/lower-case (:word %))) words))
(defn merge-phrase-words
"Given multiple `Word`, like the words for 'well off', create a single `Word`
that is syllabified as ('well' 'off') rather than as the combined ('weh'
'loff'). Useful for finding single-word rhymes of multiple-word targets.
An example: 'war on crime' -> 'turpentine'."
[phrase phrase-words]
(loop [merged (first phrase-words)
phrase-words (rest phrase-words)]
(cond
(and (empty? phrase-words) (empty? merged)) nil
(empty? phrase-words) (assoc merged :word phrase)
:else (recur (-> merged
(assoc :syllables (concat (:syllables merged)
(:syllables (first phrase-words))))
(assoc :syllable-count (+ (:syllable-count merged)
(:syllable-count (first phrase-words))))
(assoc :rimes (concat (:rimes merged)
(:rimes (first phrase-words))))
(assoc :onsets (concat (:onsets merged)
(:onsets (first phrase-words))))
(assoc :nuclei (concat (:nuclei merged)
(:nuclei (first phrase-words)))))
(rest phrase-words)))))
(defn phrase->word
"Given a word like 'well-off' or a phrase like 'war on poverty', return a Word
that has the correct syllables, rimes, onsets, and nucleus. This way we can
rhyme against phrases that aren't in the dictionary, as long as the words that
make up the phrase are in the dictionary. Returns nil if the word is not in
the dictionary."
[words phrase]
(->> (string/split phrase #"[ -]")
(map (fn [phrase-word]
(first (filter (fn [word]
(= phrase-word (string/lower-case (:word word))))
words))))
(merge-phrase-words phrase)))
(defn partition-word [word]
(->> word
(:syllables)
(u/partitions)))
(comment
(u/partitions (:syllables (phrase->word words "war on poverty")))
)
(defn rimes [words target]
(into #{}
(filter (fn [{:keys [rimes]}]
(= (last rimes) (last (:rimes target))))
words)))
(defn onsets [words target]
(into #{}
(filter (fn [{:keys [onsets]}]
(= (first onsets) (first (:onsets target))))
words)))
(defn nuclei [words target]
(into #{}
(filter (fn [{:keys [nuclei]}]
(= (last nuclei) (last (:nuclei target))))
words)))
(defn consecutive-matching
"Returns the consecutive matching rhymes of type.
Given words:
(D EY Z IY) and (K R EY Z IY)
the following would be returned for each type:
rimes: 2, (((IY) (IY)) ((EY) (EY))) - rimes are matched in reverse order
onsets: 0
nuclei: 2, (((EY) (EY)) ((IY) (IY))) - nuclei and onsets are matched in order
"
[a b type]
(let [a (if (= type :rimes) (reverse (type a)) (type a))
b (if (= type :rimes) (reverse (type b)) (type b))]
(take-while (fn [[x y]] (= x y)) (map list a b))))
(defn sort-rhymes
"Sorts by the number of consecutive matching rimes, onsets, and nuclei of each
word."
[rhymes word]
(sort (fn [a b]
(> (apply
+
(map #(count (consecutive-matching a word %))
[:rimes :onsets :nuclei]))
(apply
+
(map #(count (consecutive-matching b word %))
[:rimes :onsets :nuclei]))))
rhymes))
(defn prhyme
"Finds rhymes in dictionary `words` of `word` with options
to match on rimes, onsets, and/or nuclei."
[words word]
(let [r (if (:rimes? word) (rimes words word) #{})
o (if (:onsets? word) (onsets words word) #{})
n (if (:nuclei? word) (nuclei words word) #{})
all (set/union r o n)]
all))
(comment
(->> (make-word ["foobar" "F" "UW" "B" "AA" "R"])
(#(assoc % :rimes? true))
(prhyme words)
(filter #(= (:syllable-count %) 2))
(sort-by #(consecutive-matching
%
(make-word ["foobar" "F" "UW" "B" "AA" "R"])
:rimes)))
(as-> (make-word ["magic beam" "M" "AE" "J" "IH" "K" "B" "IY" "M"]) word
(into word {:rimes? true})
(prhyme popular word)
(mapcat #(matching-synonyms thesaurus % word)
["death" "evil" "satan" "devil" "sin" "bad" "hell"
"guts" "gore" "blood" "demon" "fear" "nightmare"
"distress" "corpse" "necrotic" "zombie"
"coma" "monster"]))
(as-> (make-word ["please turn" "P" "L" "IH" "Z" "T" "ER" "N"]) word
(into word {:rimes? true})
(prhyme popular word)
(mapcat #(matching-synonyms thesaurus % word)
["death" "evil" "satan" "devil" "sin" "bad" "hell"
"guts" "gore" "blood" "demon" "fear" "nightmare"
"distress" "corpse" "necrotic" "zombie"
"coma" "monster"]))
)
(defn matching-syllable-count [n words]
(filter #(= n (:syllable-count %)) words))
(defn matching-synonyms [thesaurus target words]
(let [synonyms (get thesaurus target)]
(filter (fn [word] (some #(re-matches (re-pattern (str "(?i)" %)) (:word word)) synonyms))
words)))
(defn make-rhyme-subtarget [wordphrase syllables]
(map->RhymeSubTarget (into
(make-word (concat [wordphrase] (flatten syllables)))
{:wordphrase wordphrase
:syllables syllables
:syllable-count (count syllables)
:rimes? true})))
(defn phrymo [dictionary phrase]
(phrase->word dictionary phrase))
(comment
(->> (phrymo popular "clover")
(partition-word)
(first)
(first))
(->> (phrymo popular "war on poverty")
(partition-word)
(take 3)
(map (fn [rhyme-target]
(map (fn [subtarget]
(make-rhyme-subtarget "war on poverty" subtarget))
rhyme-target)))
#_(map (fn [rhyme-target]
(map (fn [rhyme-sub-target]
(prhyme popular rhyme-sub-target))
rhyme-target))))
(->> (map->RhymeSubTarget (into (phrase->word words "war")
{:rimes? true
:onsets? true
:nuclei? true}))
(prhyme popular)
(matching-syllable-count 1)
(into #{})
(set/intersection
(into #{} (concat (matching-synonyms thesaurus "rich" words)))))
)
(defn alignment [target word]
(cond
(= (last (:rimes target))
(last (:rimes word)))
(- (:syllable-count target)
(count (:rimes word)))
(= (first (:onsets target))
(first (:onsets word)))
0
:else
(- (:syllable-count target)
(count (:rimes word)))))
(defn pad [char n s]
(apply str (conj (vec (repeat (- n (count s)) char)) s)))
(defn matching-position
[index syllable-count word]
(and (= syllable-count (:syllable-count word))
(= index (:alignment word))))
(defn find-synonyms
([thesaurus dict word]
(find-synonyms thesaurus dict 1 #{word} #{}))
([thesaurus dict word degree]
(find-synonyms thesaurus dict degree #{word} #{}))
([thesaurus dict degree words synonyms]
(cond
(= degree 0) synonyms
(nil? (first words))
(recur thesaurus
dict
(dec degree)
(into #{} (map #(string/lower-case (:word %)) synonyms))
synonyms)
:else
(recur thesaurus
dict
degree
(rest words)
(set/union
synonyms
(let [synonyms (->> (get thesaurus (first words))
(map string/lower-case)
(into #{}))]
(->> dict
(filter #(synonyms (string/lower-case (:word %))))
(into #{}))))))))
(comment
(->> (get thesaurus "war")
(map string/lower-case))
(->> (find-synonyms thesaurus words "evil" 2)
(map :word))
)
(defn pprint-phrase [phrase-words]
(let [phrase-words (map #(if (empty? %) '("_") %) phrase-words)
max-len (apply max (map count phrase-words))
words-cycles (map cycle phrase-words)]
(->> (map (partial take max-len) words-cycles)
(apply map vector))))
(defn pprint-table [phrase-words]
(let [phrase-words (map #(if (empty? %) '("") %) phrase-words)
max-word-lens (->> phrase-words
(map #(map count %))
(map #(apply max %)))
max-rhyme-count (count (apply max-key count phrase-words))
fmt-str (->> max-word-lens
(map #(+ 3 %))
(map #(format "%%-%ds" %))
(apply str))
phrase-words (->> phrase-words
(map #(concat % (repeat "")))
(map #(take max-rhyme-count %))
(apply map vector)
(map #(apply format fmt-str %))
(#(string/join "\n" %)))]
phrase-words))
(comment
(let [targets (map (partial phrase->word words)
["please" "turn" "on" "your" "magic" "beam"])
synonyms (into #{} (->> (mapcat #(find-synonyms thesaurus words % 2)
["evil" "war" "death" "corpse"])
(map :word)))]
(->> targets
(map #(into % {:rimes? true}))
(map (fn [target]
(->> (prhyme popular target)
(map #(assoc % :target target)))))
(map (fn [rhyming-words]
(filter #(= (:syllable-count %) (:syllable-count (:target %)))
rhyming-words)))
(map (fn [rhyming-words]
(filter #(synonyms (:word %)) rhyming-words)))
(map (fn [rhyming-words]
(map :word rhyming-words)))
(pprint-table)
(spit "rhymes.txt")))
(s/syllabify ["IH" "N" "V" "AA" "L" "V" "Z"])
(s/syllabify ["D" "EH" "B" "Y" "AH"])
(s/syllabify ["R" "AW" "N" "D" "M" "IY" "HH" "AA" "R" "T"])
;; => (("R" "AW" "N" "D") ("M" "IY") ("HH" "AA" "R" "T"))
(s/syllabify ["P" "AE" "D" "M" "AY"])
(set/union (rimes words (make-word ["boat" "B" "OW" "T"]))
(onsets words (make-word ["ballboy" "D" "AH" "L" "B" "OY"]))))

@ -24,7 +24,13 @@
;; As an alternative to handling the isolated "s"-at-the-end-of-internal-coda case,
;; it works well-enough for me to treat all fricatives as lowest priority.
(def sonority-hierarchy
["vowel" "semivowel" "liquid" "nasal" "aspirate" "affricate" "fricative" "stop"])
["vowel" "liquid" "affricate" "fricative" "nasal" "stop" "semivowel" "aspirate"])
;; Ok. Sonority hierarchy doesn't work.
;; Think abount nasals and stops.
;; N D IH NG doesn't work.
;; D N IH NG doesn't work.
;; Nasal/Stop doesn't work in any order.
(def lax-vowels #{"EH" "IH" "AE" "AH" "UH"})
@ -48,10 +54,12 @@
onset []]
(cond
(empty? phones) [onset []]
(empty? (filter #(vowel? %) phones)) [(into onset phones) '()]
(empty? onset) (recur (rest phones) [(first phones)])
(not (>sonorous (first phones) (last onset))) [onset phones]
:else (recur (rest phones) (conj onset (first phones)))))))
(comment
(slurp-onset (reverse ["B" "W"])))
(defn fix-lax
"https://www.reddit.com/r/phonetics/comments/i7hp5f/what_is_the_alaska_rule_in_reference_to/
@ -84,14 +92,15 @@
(let [[rime phones] (slurp-rime phones)
[onset phones] (slurp-onset phones)]
(cond
(= \Y (last (first onset)))
;; ROYAL -> R OY - AH L
;; DEBUTANT -> D EH B - Y AH - T AH N T
(and (< 1 (count (first onset)))
(= \Y (last (first onset))))
(recur phones (into segments [rime onset]))
:else
(recur phones (conj segments (concat rime onset)))))))))
(= \Y (ffirst '("YO")))
(first (ffirst (slurp-onset ["OY" "G" "AH"])))
(comment
(syllabify ["AH" "L" "AE" "S" "K" "AH"])
(syllabify ["H" "ER" "AH" "L" "D"])

@ -4,47 +4,13 @@
[clojure.set :as set]
[clojure.zip :as z]))
;; Pulled from cmudict-0.7b.phones.
;; {"AY" "vowel
;; "B" "
(def phonemap
{"AA" "vowel"
"AE" "vowel"
"AH" "vowel"
"AO" "vowel"
"AW" "vowel"
"AY" "vowel"
"B" "stop"
"CH" "affricate"
"D" "stop"
"DH" "fricative"
"EH" "vowel"
"ER" "vowel"
"EY" "vowel"
"F" "fricative"
"G" "stop"
"HH" "aspirate"
"IH" "vowel"
"IY" "vowel"
"JH" "affricate"
"K" "stop"
"L" "liquid"
"M" "nasal"
"N" "nasal"
"NG" "nasal"
"OW" "vowel"
"OY" "vowel"
"P" "stop"
"R" "liquid"
"S" "fricative"
"SH" "fricative"
"T" "stop"
"TH" "fricative"
"UH" "vowel"
"UW" "vowel"
"V" "fricative"
"W" "semivowel"
"Y" "semivowel"
"Z" "fricative"
"ZH" "fricative"})
(->> (io/reader (io/resource "cmudict-0.7b.phones"))
(line-seq)
(map #(string/split % #"\t"))
(into {})))
(def long-vowel #{"EY" "IY" "AY" "OW" "UW"})
@ -66,9 +32,10 @@
[line]
(string/split line #"[\t ]"))
(defn take-through [pred coll]
(defn take-through
"(take-through even? [1 2 3 4 7 7 5 2 8 10])
returns '((1 2 3 4) (7 7 5 2) (8) (10))"
[pred coll]
(loop [coll coll
acc '()]
(cond
@ -92,48 +59,29 @@
(pred (first coll)) (recur (rest coll) (inc cur-count) max-count)
:else (recur (rest coll) 0 (max cur-count max-count)))))
(defn pp-word [word]
(let [spelling (first word)
phones (rest word)
phonetypes (map phonemap phones)
formatted-phones (map #(format "%-10s" %) phones)
formatted-phonetypes (map #(format "%-10s" %) phonetypes)]
(format "%s\n%s\n%s"
spelling
(string/join " " formatted-phones)
(string/join " " formatted-phonetypes))))
(defn count-pred [pred coll]
(count (filter pred coll)))
(def count-vowels (partial count-pred vowel))
(defn node->zipper [node]
(z/zipper (fn branch? [node]
(defn single? [coll] (= 1 (count coll)))
(defn partitions
"There is a partitions in clojure.combinatorics that might be more
efficient. This was fun to write. Want to understand more ways to
write this algorithm. How to make it lazy? How to jump immediately
to a specific rank?"
([coll]
(partitions coll '()))
([coll acc]
(cond
(map? node)
(->> (keys (into {} node))
(remove #{:word})
((complement empty?)))
(empty? coll) acc
(single? coll) `((~coll))
:else
(do
(let [b (->> (keys (into {} (second node)))
(remove #{:word})
((complement empty?)))]
b))))
(fn children [node]
(let [node (if (map? node) node (second node))
ch (seq (select-keys node (remove #{:word} (keys node))))]
ch))
(fn make-node [node ch]
(into {} ch))
node))
(defn leafs [leaf? zipper]
(->> zipper
(iterate z/next)
(take-while (complement z/end?))
(map z/node)
(filter leaf?)))
(def word-leafs (partial leafs (fn [node] (:word (second node)))))
(let [x (first coll)]
(reduce (fn [val el]
(cons
(cons (cons x (first el)) (rest el))
(cons (cons (list x) el) val)))
'()
(partitions (rest coll) acc))))))

Loading…
Cancel
Save