@ -93,7 +93,7 @@ This software will accomplish its primary objective if it makes its way into the
Several secondary objectives are also desirable and reasonably expected. The architecture of the software lends itself to existing as several independently useful modules.
For example, the [[https://en.wikipedia.org/wiki/Hidden_Markov_model][Markov Model]] can be conveniently backed by a [[https://en.wikipedia.org/wiki/Trie][Trie data structure]]. This Trie data structure can be released as its own software package and used any application that benefits from prefix matching.
For example, the [[https://en.wikipedia.org/wiki/Hidden_Markov_model][Markov Model]] (Markov Model 2021) can be conveniently backed by a [[https://en.wikipedia.org/wiki/Trie][Trie data structure]] (Trie 2021). This Trie data structure can be released as its own software package and used any application that benefits from prefix matching.
Another example is the package that turns phrases into phones (symbols of pronunciation). That package can find use for a number of natural language processing and natural language generation tasks, aside from the task required by this particular project.
@ -130,9 +130,9 @@ The only stakeholders in the project will be the record labels or songwriters. I
** Ethical And Legal Considerations
Web scraping, the method used to obtain the initial dataset from http://darklyrics.com, is protected given the ruling in [[https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn]].
Web scraping, the method used to obtain the initial dataset from http://darklyrics.com, is protected given the ruling in [[https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn]] (HiQ Labs v. LinkedIn 2021).
The use of publicly available data in generative works is less clear. But Microsoft's lawyers deemed it sound given their recent release of Github CoPilot ([[https://www.theverge.com/2021/7/7/22561180/github-copilot-legal-copyright-fair-use-public-code]]).
The use of publicly available data in generative works is less clear. But Microsoft's lawyers deemed it sound given their recent release of Github CoPilot (Gershgorn, 2021).
** Expertise
@ -187,7 +187,7 @@ Each new model can be uploaded to the web server and users can select which mode
RhymeStorm™ development will proceed with an iterative Agile methodology. It will be composed of several independent modules that can be worked on independently, in parallel, and iteratively.
The Trie data structure that will be used as a backing to the Hidden Markov Model can be worked on in isolation from any other aspect of the project. The first iteration can use a simple hash-map as a backing store. The second iteration can improve memory efficiency by using a ByteBuffer as a [[https://aclanthology.org/W09-1505.pdf][Tightly Packed Trie]]. Future iterations can continue to improve performance metrics.
The Trie data structure that will be used as a backing to the Hidden Markov Model can be worked on in isolation from any other aspect of the project. The first iteration can use a simple hash-map as a backing store. The second iteration can improve memory efficiency by using a ByteBuffer as a [[https://aclanthology.org/W09-1505.pdf][Tightly Packed Trie]] (Germann et al., 2009) Future iterations can continue to improve performance metrics.
The web server can be implemented initially without security measures like HTTPS and performance measures like load balancing. Future iterations can add these features as they become necessary.
@ -345,6 +345,8 @@ The dataset currently in use was generated from the publicly available lyrics at
Further datasets will need to be provided by the end-user.
The trained dataset is available as a resource in this repository at ~web/resources/models/~.
** Decision Support Functionality
*** Choosing Words For A Lyric Based On Markov Likelihood
@ -651,7 +653,7 @@ The code sample below demonstrates training a Hidden Markov Model on a set of ly
It also performs compaction and serialization. Song lyrics are typically provided as text files. Reading files on a hard drive is an expensive process, but we can perform that expensive training process only once and save the resulting Markov Model in a more memory-efficient format.
#+begin_src clojure :session main :results output pp