It’s probably not necessary for you to replicate my development environment in order to evaluate this project. You can access the deployed application at <ahref="https://darklimericks.com/wgu">https://darklimericks.com/wgu</a> and the libraries and supporting code that I wrote for this project at <ahref="https://github.com/eihli/clj-tightly-packed-trie">https://github.com/eihli/clj-tightly-packed-trie</a>, <ahref="https://github.com/eihli/syllabify">https://github.com/eihli/syllabify</a>, and <ahref="https://github.com/eihli/prhyme">https://github.com/eihli/prhyme</a>. The web server and web application is not hosted publicly but you will find it uploaded with my submission as a <code>.tar</code> archive.
It’s probably not necessary for you to replicate my development environment in order to evaluate this project. You can access the deployed application at <ahref="https://darklimericks.com/wgu">https://darklimericks.com/wgu</a> and the libraries and supporting code that I wrote for this project at <ahref="https://github.com/eihli/clj-tightly-packed-trie">https://github.com/eihli/clj-tightly-packed-trie</a>, <ahref="https://github.com/eihli/syllabify">https://github.com/eihli/syllabify</a>, and <ahref="https://github.com/eihli/prhyme">https://github.com/eihli/prhyme</a>. The web server and web application is not hosted publicly but you will find it uploaded with my submission as a <code>.tar</code> archive.
<h3id="org464e0d5"><spanclass="section-number-3">3.1</span> Problem Summary</h3>
<h3id="orgfbabd96"><spanclass="section-number-3">3.1</span> Problem Summary</h3>
<divclass="outline-text-3"id="text-3-1">
<divclass="outline-text-3"id="text-3-1">
<p>
<p>
Songwriters, artists, and record labels can save time and discover better lyrics with the help of a machine learning tool that supports their creative endeavours.
Songwriters, artists, and record labels can save time and discover better lyrics with the help of a machine learning tool that supports their creative endeavours.
@ -448,8 +448,8 @@ Songwriters have several old-fashioned tools at their disposal including diction
How many sensible phrases can you think of that rhyme with “war on poverty”? What if I say that there’s a restriction to only come up with phrases that are exactly 14 syllables? That’s a common restriction when a songwriter is trying to match the meter of a previous line. What if I add another restriction that there must be primary stress at certain spots in that 14 syllable phrase?
How many sensible phrases can you think of that rhyme with “war on poverty”? What if I say that there’s a restriction to only come up with phrases that are exactly 14 syllables? That’s a common restriction when a songwriter is trying to match the meter of a previous line. What if I add another restriction that there must be primary stress at certain spots in that 14 syllable phrase?
@ -465,8 +465,8 @@ And this is a process that is perfect for machine learning. Machine learning can
RhymeStorm™ is a tool to help songwriters brainstorm. It provides lyrics automatically generated based on training data from existing songs while adhering to restrictions based on rhyme scheme, meter, genre, and more.
RhymeStorm™ is a tool to help songwriters brainstorm. It provides lyrics automatically generated based on training data from existing songs while adhering to restrictions based on rhyme scheme, meter, genre, and more.
@ -494,8 +494,8 @@ This auto-complete functionality will be similar to the auto-complete that is co
The initial model will be trained on the lyrics from <ahref="http://darklyrics.com">http://darklyrics.com</a>. This is a publicly available data set with minimal meta-data. Record labels will have more valuable datasets that will include meta-data along with lyrics, such as the date the song was popular, the number of radio plays of the song, the profit of the song/artist, etc…
The initial model will be trained on the lyrics from <ahref="http://darklyrics.com">http://darklyrics.com</a>. This is a publicly available data set with minimal meta-data. Record labels will have more valuable datasets that will include meta-data along with lyrics, such as the date the song was popular, the number of radio plays of the song, the profit of the song/artist, etc…
@ -507,8 +507,8 @@ The software can be augmented with additional algorithms to account for the type
<h3id="org3c3a5f3"><spanclass="section-number-3">3.6</span> Development Methodology - Agile</h3>
<h3id="orge82a74d"><spanclass="section-number-3">3.6</span> Development Methodology - Agile</h3>
<divclass="outline-text-3"id="text-3-6">
<divclass="outline-text-3"id="text-3-6">
<p>
<p>
This project will be developed with an iterative Agile methodology. Since a large part of data science and machine learning is exploration, this project will benefit from ongoing exploration in tandem with development.
This project will be developed with an iterative Agile methodology. Since a large part of data science and machine learning is exploration, this project will benefit from ongoing exploration in tandem with development.
@ -545,8 +545,8 @@ The prices quoted below are for an initial minimum-viable-product that will serv
Funding requirements are minimal. The initial dataset is public and freely available. On a typical consumer laptop, Hidden Markov Models can be trained on fairly large datasets in short time and the training doesn’t require the use of expensive hardware like the GPUs used to train Deep Neural Networks.
Funding requirements are minimal. The initial dataset is public and freely available. On a typical consumer laptop, Hidden Markov Models can be trained on fairly large datasets in short time and the training doesn’t require the use of expensive hardware like the GPUs used to train Deep Neural Networks.
@ -630,17 +630,17 @@ These are my estimates for the time and cost of different aspects of initial dev
The only stakeholders in the project will be the record labels or songwriters. I describe the only impact to them in the <ahref="#orgf37cff4">3.2</a> section above.
The only stakeholders in the project will be the record labels or songwriters. I describe the only impact to them in the <ahref="#org67529ee">3.2</a> section above.
<h3id="org09fcfb8"><spanclass="section-number-3">3.9</span> Ethical And Legal Considerations</h3>
<h3id="org83ba9b1"><spanclass="section-number-3">3.9</span> Ethical And Legal Considerations</h3>
<divclass="outline-text-3"id="text-3-9">
<divclass="outline-text-3"id="text-3-9">
<p>
<p>
Web scraping, the method used to obtain the initial dataset from <ahref="http://darklyrics.com">http://darklyrics.com</a>, is protected given the ruling in <ahref="https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn">https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn</a>.
Web scraping, the method used to obtain the initial dataset from <ahref="http://darklyrics.com">http://darklyrics.com</a>, is protected given the ruling in <ahref="https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn">https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn</a>.
@ -652,8 +652,8 @@ The use of publicly available data in generative works is less clear. But Micros
I have 10 years experience as a programmer and have worked extensively on both frontend technologies like HTML/JavaScript, backend technologies like Django, and building libraries/packages/frameworks.
I have 10 years experience as a programmer and have worked extensively on both frontend technologies like HTML/JavaScript, backend technologies like Django, and building libraries/packages/frameworks.
@ -674,8 +674,8 @@ Write an executive summary directed to IT professionals that addresses each of t
<h3id="org3e39ab1"><spanclass="section-number-3">4.1</span> Decision Support Opportunity</h3>
<h3id="orgba8718f"><spanclass="section-number-3">4.1</span> Decision Support Opportunity</h3>
<divclass="outline-text-3"id="text-4-1">
<divclass="outline-text-3"id="text-4-1">
<p>
<p>
Songwriters expend a lot of time and effort finding the perfect rhyming word or phrase. RhymeStorm™ is going to amplify user’s creative abilities by searching its machine learning model for sensible and proven-successful words and phrases that meet the rhyme scheme and meter requirements requested by the user.
Songwriters expend a lot of time and effort finding the perfect rhyming word or phrase. RhymeStorm™ is going to amplify user’s creative abilities by searching its machine learning model for sensible and proven-successful words and phrases that meet the rhyme scheme and meter requirements requested by the user.
@ -687,8 +687,8 @@ When a songwriter needs to find likely phrases that rhyme with “war on pov
<h3id="org70f3f81"><spanclass="section-number-3">4.2</span> Customer Needs And Product Description</h3>
<h3id="org16507af"><spanclass="section-number-3">4.2</span> Customer Needs And Product Description</h3>
<divclass="outline-text-3"id="text-4-2">
<divclass="outline-text-3"id="text-4-2">
<p>
<p>
Songwriters spend money on dictionaries, compilations of slang, thesauruses, and phrase dictionaries. They spend their time daydreaming, brainstorming, contemplating, and mixing and matching the knowledge they acquire through these traditional means.
Songwriters spend money on dictionaries, compilations of slang, thesauruses, and phrase dictionaries. They spend their time daydreaming, brainstorming, contemplating, and mixing and matching the knowledge they acquire through these traditional means.
@ -708,8 +708,8 @@ Computers can process and sort this information and sort the results by quality
<h3id="org1e4bde0"><spanclass="section-number-3">4.4</span> Available Data And Future Data Lifecycle</h3>
<h3id="orgf38bff9"><spanclass="section-number-3">4.4</span> Available Data And Future Data Lifecycle</h3>
<divclass="outline-text-3"id="text-4-4">
<divclass="outline-text-3"id="text-4-4">
<p>
<p>
The initial dataset will be gathered by downloading lyrics from <ahref="http://darklyrics.com">http://darklyrics.com</a> and future models can be generated by downloading lyrics from other websites. Alternatively, data can be provided by record labels and combined with meta-data that the record label may have, such as how many radio plays each song gets and how much profit they make from each song.
The initial dataset will be gathered by downloading lyrics from <ahref="http://darklyrics.com">http://darklyrics.com</a> and future models can be generated by downloading lyrics from other websites. Alternatively, data can be provided by record labels and combined with meta-data that the record label may have, such as how many radio plays each song gets and how much profit they make from each song.
@ -750,8 +750,8 @@ Each new model can be uploaded to the web server and users can select which mode
RhymeStorm™ development will proceed with an iterative Agile methodology. It will be composed of several independent modules that can be worked on independently, in parallel, and iteratively.
RhymeStorm™ development will proceed with an iterative Agile methodology. It will be composed of several independent modules that can be worked on independently, in parallel, and iteratively.
@ -775,8 +775,8 @@ Much of data science is exploratory and taking an iterative Agile approach can t
<h3id="org0641d83"><spanclass="section-number-3">4.9</span> Programming Environments And Costs</h3>
<h3id="org9bd94bd"><spanclass="section-number-3">4.9</span> Programming Environments And Costs</h3>
<divclass="outline-text-3"id="text-4-9">
<divclass="outline-text-3"id="text-4-9">
<p>
<p>
the programming environments and any related costs, as well as the human resources that are necessary to execute each phase in the development of the data product
the programming environments and any related costs, as well as the human resources that are necessary to execute each phase in the development of the data product
@ -877,8 +877,8 @@ All code was written and all models were trained on a Lenovo T15G with an Intel
<li><aid="orgdb5100a"></a>Most Common Grammatical Structures In A Set Of Lyrics<br/>
<li><aid="orgd8d18dc"></a>Most Common Grammatical Structures In A Set Of Lyrics<br/>
<divclass="outline-text-5"id="text-5-1-1-1">
<divclass="outline-text-5"id="text-5-1-1-1">
<p>
<p>
By filtering songs by metrics such as popularity, number of awards, etc… we can use this software package to determine the most common grammatical phrase structure for different filtered categories.
By filtering songs by metrics such as popularity, number of awards, etc… we can use this software package to determine the most common grammatical phrase structure for different filtered categories.
@ -1042,12 +1042,12 @@ In the example below, you’ll see that a simple noun-phrase is the most pop
Artists/Songwriters place a lot of value in the secrecy of their content. Therefore, all communication with the web-based interface occurs over a secure connection using HTTPS.
Artists/Songwriters place a lot of value in the secrecy of their content. Therefore, all communication with the web-based interface occurs over a secure connection using HTTPS.
@ -2039,15 +2039,15 @@ With this precaution in place, attackers will not be able to snoop the content t
<h3id="org9f1976e"><spanclass="section-number-3">5.11</span> Tools To Monitor And Maintain The Product</h3>
<h3id="orgbeec8d8"><spanclass="section-number-3">5.11</span> Tools To Monitor And Maintain The Product</h3>
<divclass="outline-text-3"id="text-5-11">
<divclass="outline-text-3"id="text-5-11">
<p>
<p>
By having the application server behind an HAProxy load balancer, we can take advantage of the built-in HAProxy stats page for monitoring amount of traffic and health of the application servers.
By having the application server behind an HAProxy load balancer, we can take advantage of the built-in HAProxy stats page for monitoring amount of traffic and health of the application servers.
</p>
</p>
<divid="org2f0e245" class="figure">
<divid="orge2868d1" class="figure">
<p><imgsrc="images/stats.png"alt="stats.png"/>
<p><imgsrc="images/stats.png"alt="stats.png"/>
</p>
</p>
</div>
</div>
@ -2066,8 +2066,8 @@ The server also includes the <code>certbot</code> script for updating and mainta
<h3id="org3ef6d25"><spanclass="section-number-3">5.12</span> A User-Friendly, Functional Dashboard That Includes At Least Three Visualization Types</h3>
<h3id="org237f855"><spanclass="section-number-3">5.12</span> A User-Friendly, Functional Dashboard That Includes At Least Three Visualization Types</h3>
<divclass="outline-text-3"id="text-5-12">
<divclass="outline-text-3"id="text-5-12">
<p>
<p>
You can access an example of the user interface at <ahref="https://darklimericks.com/wgu">https://darklimericks.com/wgu</a>.
You can access an example of the user interface at <ahref="https://darklimericks.com/wgu">https://darklimericks.com/wgu</a>.
@ -2086,7 +2086,7 @@ The first visualization is a scatter plot of rhyming words with the “quali
It’s difficult to objectively test the models accuracy since the goal of “brainstorm new lyric” is such a subjective goal. A valid test of that goal will require many human subjects to subjectively evaluate their performance while using the tool compared to their performance without the tool.
It’s difficult to objectively test the models accuracy since the goal of “brainstorm new lyric” is such a subjective goal. A valid test of that goal will require many human subjects to subjectively evaluate their performance while using the tool compared to their performance without the tool.
@ -2437,8 +2437,8 @@ If we allow ourselves the assumption that the close a generated phrase is to a v
<h4id="orgc57d287"><spanclass="section-number-4">6.6.1</span> Percentage Of Generated Lines That Are Valid English Sentences</h4>
<h4id="org3dd10eb"><spanclass="section-number-4">6.6.1</span> Percentage Of Generated Lines That Are Valid English Sentences</h4>
<divclass="outline-text-4"id="text-6-6-1">
<divclass="outline-text-4"id="text-6-6-1">
<p>
<p>
We can use <ahref="https://opennlp.apache.org/">Apache OpenNLP</a> to parse sentences into a grammar structure conforming to the parts of speech specified by the <ahref="https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html">University of Pennsylvania’s Treebank Project</a>.
We can use <ahref="https://opennlp.apache.org/">Apache OpenNLP</a> to parse sentences into a grammar structure conforming to the parts of speech specified by the <ahref="https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html">University of Pennsylvania’s Treebank Project</a>.
@ -2516,8 +2516,8 @@ Where <code>nlp/valid-sentence?</code> is defined as follows.
My language of choice for this project encourages a programming technique or paradigm known as REPL-driven development. REPL stands for Read-Eval-Print-Loop. This is a way to write and test code in real-time without a compilation step. Individual code chunks can be evaluated inside an editor, resulting in rapid feedback.
My language of choice for this project encourages a programming technique or paradigm known as REPL-driven development. REPL stands for Read-Eval-Print-Loop. This is a way to write and test code in real-time without a compilation step. Individual code chunks can be evaluated inside an editor, resulting in rapid feedback.
@ -2561,12 +2561,12 @@ Here is an example of the test suite for the code related to syllabification: <a