By filtering songs by metrics such as popularity, number of awards, etc... we can use this software package to determine the most common grammatical phrase structure for different filtered categories.
Since much of the data a record label might want to categorize songs by is likely proprietary, filtering the songs by whatever metric is the responsibility of the user.
Once the songs are filtered/categorized, they can be passed to this software where a list of the most popular grammar structures will be returned.
In the example below, you'll see that a simple noun-phrase is the most popular structure with 6 occurrences, tied with a sentence composed of a prepositional-phrase, verb-phrase, and adjective.
#+begin_src clojure :results value :session main :exports both
To help songwriters think of new lyrics, we provide an API to receive a list of words that commonly follow/precede a given phrase.
Models can be trained on different genres or categories of songs. This will ensure that recommended lyric completions are apt.
In the example below, we provide a seed suffix of "bother me" and ask the software to predict the most likely words that precede that phrase. The resulting most popular phrases are "don't bother me", "doesn't bother me", "to bother me", "won't bother me", etc...
The software can be seeded with a simple "end-of-sentence" or "beginning-of-sentence" token and can be asked to work backwards to build a phrase that meets certain criteria.
The official definition of a "perfect" rhyme is when two words have matching phonemes starting from their primary stress.
For example: technology and ecology. Both of those words have a stress on the second syllable. The first syllables differ. But from the stressed syllable on, they have exactly matching phones.
A rhyme that might be useful to a songwriter but that doesn't fit the definition of a "perfect" rhyme would be "technology" and "economy". Those two words just barely break the rules for a perfect rhyme. Their vowel phones match from their primary stress to their ends. But one of the consonant phones doesn't match.
Singers and songwriters have some flexibility and artistic freedom and imperfect rhymes can be a fallback.
Therefore, this software provides functionality to sort rhymes so that rhymes that are closer to perfect are first in the ordering.
In the example below, you'll see that the first 20 or so rhymes are perfect, but then "hypocrisy" is listed as rhyming with "technology". This is for the reason just mentioned. It's close to a perfect rhyme and it's of interest to singers/songwriters.
#+begin_src clojure :results value table :colnames yes :session main :exports both
The Trie data structure supports a ~lookup~ function that returns the child trie at a certain lookup key and a ~children~ function that returns all of the immediate children of a particular Trie.
The results above show a sample of 10 elements in a 1-to-3-gram trie
The code sample below demonstrates training a Hidden Markov Model on a set of lyrics where each line gets reversed. This model is useful for predicting words backwards, so that you can start with the rhyming end of a word or phrase and generate backwards to the start of the lyric.
It also performs compaction and serialization. Song lyrics are typically provided as text files. Reading files on a hard drive is an expensive process, but we can perform that expensive training process only once and save the resulting Markov Model in a more memory-efficient format.
Functionalities To Evaluate The Accuracy Of The Data Product
Since creative brainstorming is the goal, "accuracy" is subjective.
We can, however, measure and compare language generation algorithms against how "expected" a phrase is given the training data. This measurement is "perplexity".
#+begin_src clojure :session main :exports both :results output
"%s is the perplexity of \"%s\" \"hole\" \"</s>\" \"</s>\""
(->> seed
(map database)
(markov/perplexity 4 markov-tight-trie))
word))))
["a" "this" "that"])
nil)
#+end_src
#+RESULTS:
: "a" has preceeded "hole" "</s>" "</s>" a total of 250 times
: "this" has preceeded "hole" "</s>" "</s>" a total of 173 times
: "that" has preceeded "hole" "</s>" "</s>" a total of 45 times
: -12.184088569934774 is the perplexity of "a" "hole" "</s>" "</s>"
: -12.552930899563904 is the perplexity of "this" "hole" "</s>" "</s>"
: -13.905719644461469 is the perplexity of "that" "hole" "</s>" "</s>"
The results above make intuitive sense. The most common word to preceed "hole" at the end of a sentence is the word "a". There are 250 instances of sentences of "... a hole.". That can be compared to 173 instances of "... this hole." and 45 instances of "... that hole.".
Therefore, "... a hole." is has the lowest "perplexity".
This standardized measure of accuracy can be used to compare different language generation algorithms.
** Security Features
Artists/Songwriters place a lot of value in the secrecy of their content. Therefore, all communication with the web-based interface occurs over a secure connection using HTTPS.
Security certificates are generated using Let's Encrypt and an Nginx web server handles the SSL termination.
With this precaution in place, attackers will not be able to snoop the content that songwriters are sending to or receiving from the servers.