You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1040 lines
64 KiB
HTML

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<!-- 2021-07-13 Tue 20:39 -->
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>RhymeStorm - WGU CSCI Capstone Project</title>
<meta name="author" content="Eric Ihli" />
<meta name="generator" content="Org Mode" />
<style type="text/css">
<!--/*--><![CDATA[/*><!--*/
.title { text-align: center;
margin-bottom: .2em; }
.subtitle { text-align: center;
font-size: medium;
font-weight: bold;
margin-top:0; }
.todo { font-family: monospace; color: red; }
.done { font-family: monospace; color: green; }
.priority { font-family: monospace; color: orange; }
.tag { background-color: #eee; font-family: monospace;
padding: 2px; font-size: 80%; font-weight: normal; }
.timestamp { color: #bebebe; }
.timestamp-kwd { color: #5f9ea0; }
.org-right { margin-left: auto; margin-right: 0px; text-align: right; }
.org-left { margin-left: 0px; margin-right: auto; text-align: left; }
.org-center { margin-left: auto; margin-right: auto; text-align: center; }
.underline { text-decoration: underline; }
#postamble p, #preamble p { font-size: 90%; margin: .2em; }
p.verse { margin-left: 3%; }
pre {
border: 1px solid #ccc;
box-shadow: 3px 3px 3px #eee;
padding: 8pt;
font-family: monospace;
overflow: auto;
margin: 1.2em;
}
pre.src {
position: relative;
overflow: auto;
padding-top: 1.2em;
}
pre.src:before {
display: none;
position: absolute;
background-color: white;
top: -10px;
right: 10px;
padding: 3px;
border: 1px solid black;
}
pre.src:hover:before { display: inline; margin-top: 14px;}
/* Languages per Org manual */
pre.src-asymptote:before { content: 'Asymptote'; }
pre.src-awk:before { content: 'Awk'; }
pre.src-C:before { content: 'C'; }
/* pre.src-C++ doesn't work in CSS */
pre.src-clojure:before { content: 'Clojure'; }
pre.src-css:before { content: 'CSS'; }
pre.src-D:before { content: 'D'; }
pre.src-ditaa:before { content: 'ditaa'; }
pre.src-dot:before { content: 'Graphviz'; }
pre.src-calc:before { content: 'Emacs Calc'; }
pre.src-emacs-lisp:before { content: 'Emacs Lisp'; }
pre.src-fortran:before { content: 'Fortran'; }
pre.src-gnuplot:before { content: 'gnuplot'; }
pre.src-haskell:before { content: 'Haskell'; }
pre.src-hledger:before { content: 'hledger'; }
pre.src-java:before { content: 'Java'; }
pre.src-js:before { content: 'Javascript'; }
pre.src-latex:before { content: 'LaTeX'; }
pre.src-ledger:before { content: 'Ledger'; }
pre.src-lisp:before { content: 'Lisp'; }
pre.src-lilypond:before { content: 'Lilypond'; }
pre.src-lua:before { content: 'Lua'; }
pre.src-matlab:before { content: 'MATLAB'; }
pre.src-mscgen:before { content: 'Mscgen'; }
pre.src-ocaml:before { content: 'Objective Caml'; }
pre.src-octave:before { content: 'Octave'; }
pre.src-org:before { content: 'Org mode'; }
pre.src-oz:before { content: 'OZ'; }
pre.src-plantuml:before { content: 'Plantuml'; }
pre.src-processing:before { content: 'Processing.js'; }
pre.src-python:before { content: 'Python'; }
pre.src-R:before { content: 'R'; }
pre.src-ruby:before { content: 'Ruby'; }
pre.src-sass:before { content: 'Sass'; }
pre.src-scheme:before { content: 'Scheme'; }
pre.src-screen:before { content: 'Gnu Screen'; }
pre.src-sed:before { content: 'Sed'; }
pre.src-sh:before { content: 'shell'; }
pre.src-sql:before { content: 'SQL'; }
pre.src-sqlite:before { content: 'SQLite'; }
/* additional languages in org.el's org-babel-load-languages alist */
pre.src-forth:before { content: 'Forth'; }
pre.src-io:before { content: 'IO'; }
pre.src-J:before { content: 'J'; }
pre.src-makefile:before { content: 'Makefile'; }
pre.src-maxima:before { content: 'Maxima'; }
pre.src-perl:before { content: 'Perl'; }
pre.src-picolisp:before { content: 'Pico Lisp'; }
pre.src-scala:before { content: 'Scala'; }
pre.src-shell:before { content: 'Shell Script'; }
pre.src-ebnf2ps:before { content: 'ebfn2ps'; }
/* additional language identifiers per "defun org-babel-execute"
in ob-*.el */
pre.src-cpp:before { content: 'C++'; }
pre.src-abc:before { content: 'ABC'; }
pre.src-coq:before { content: 'Coq'; }
pre.src-groovy:before { content: 'Groovy'; }
/* additional language identifiers from org-babel-shell-names in
ob-shell.el: ob-shell is the only babel language using a lambda to put
the execution function name together. */
pre.src-bash:before { content: 'bash'; }
pre.src-csh:before { content: 'csh'; }
pre.src-ash:before { content: 'ash'; }
pre.src-dash:before { content: 'dash'; }
pre.src-ksh:before { content: 'ksh'; }
pre.src-mksh:before { content: 'mksh'; }
pre.src-posh:before { content: 'posh'; }
/* Additional Emacs modes also supported by the LaTeX listings package */
pre.src-ada:before { content: 'Ada'; }
pre.src-asm:before { content: 'Assembler'; }
pre.src-caml:before { content: 'Caml'; }
pre.src-delphi:before { content: 'Delphi'; }
pre.src-html:before { content: 'HTML'; }
pre.src-idl:before { content: 'IDL'; }
pre.src-mercury:before { content: 'Mercury'; }
pre.src-metapost:before { content: 'MetaPost'; }
pre.src-modula-2:before { content: 'Modula-2'; }
pre.src-pascal:before { content: 'Pascal'; }
pre.src-ps:before { content: 'PostScript'; }
pre.src-prolog:before { content: 'Prolog'; }
pre.src-simula:before { content: 'Simula'; }
pre.src-tcl:before { content: 'tcl'; }
pre.src-tex:before { content: 'TeX'; }
pre.src-plain-tex:before { content: 'Plain TeX'; }
pre.src-verilog:before { content: 'Verilog'; }
pre.src-vhdl:before { content: 'VHDL'; }
pre.src-xml:before { content: 'XML'; }
pre.src-nxml:before { content: 'XML'; }
/* add a generic configuration mode; LaTeX export needs an additional
(add-to-list 'org-latex-listings-langs '(conf " ")) in .emacs */
pre.src-conf:before { content: 'Configuration File'; }
table { border-collapse:collapse; }
caption.t-above { caption-side: top; }
caption.t-bottom { caption-side: bottom; }
td, th { vertical-align:top; }
th.org-right { text-align: center; }
th.org-left { text-align: center; }
th.org-center { text-align: center; }
td.org-right { text-align: right; }
td.org-left { text-align: left; }
td.org-center { text-align: center; }
dt { font-weight: bold; }
.footpara { display: inline; }
.footdef { margin-bottom: 1em; }
.figure { padding: 1em; }
.figure p { text-align: center; }
.equation-container {
display: table;
text-align: center;
width: 100%;
}
.equation {
vertical-align: middle;
}
.equation-label {
display: table-cell;
text-align: right;
vertical-align: middle;
}
.inlinetask {
padding: 10px;
border: 2px solid gray;
margin: 10px;
background: #ffffcc;
}
#org-div-home-and-up
{ text-align: right; font-size: 70%; white-space: nowrap; }
textarea { overflow-x: auto; }
.linenr { font-size: smaller }
.code-highlighted { background-color: #ffff00; }
.org-info-js_info-navigation { border-style: none; }
#org-info-js_console-label
{ font-size: 10px; font-weight: bold; white-space: nowrap; }
.org-info-js_search-highlight
{ background-color: #ffff00; color: #000000; font-weight: bold; }
.org-svg { width: 90%; }
/*]]>*/-->
</style>
<script type="text/javascript">
// @license magnet:?xt=urn:btih:e95b018ef3580986a04669f1b5879592219e2a7a&dn=public-domain.txt Public Domain
<!--/*--><![CDATA[/*><!--*/
function CodeHighlightOn(elem, id)
{
var target = document.getElementById(id);
if(null != target) {
elem.classList.add("code-highlighted");
target.classList.add("code-highlighted");
}
}
function CodeHighlightOff(elem, id)
{
var target = document.getElementById(id);
if(null != target) {
elem.classList.remove("code-highlighted");
target.classList.remove("code-highlighted");
}
}
/*]]>*///-->
// @license-end
</script>
</head>
<body>
<div id="content">
<h1 class="title">RhymeStorm - WGU CSCI Capstone Project</h1>
<div id="table-of-contents">
<h2>Table of Contents</h2>
<div id="text-table-of-contents">
<ul>
<li><a href="#org14a3a15">1. RhymeStorm Capstone Requirements Documentation</a>
<ul>
<li><a href="#org716e7d7">1.1. Descriptive And Predictive Methods</a>
<ul>
<li><a href="#org12c93c0">1.1.1. Descriptive Method</a></li>
<li><a href="#org6c4e064">1.1.2. Prescriptive Method</a></li>
</ul>
</li>
<li><a href="#org1ccfefb">1.2. Datasets</a></li>
<li><a href="#org8b56e0a">1.3. Decision Support Functionality</a>
<ul>
<li><a href="#org19a500c">1.3.1. Choosing Words For A Lyric Based On Markov Likelihood</a></li>
<li><a href="#orge4b7a97">1.3.2. Choosing Words To Complete A Lyric Based On Rhyme Quality</a></li>
</ul>
</li>
<li><a href="#orgf13759f">1.4. Featurizing, Parsing, Cleaning, And Wrangling Data</a></li>
<li><a href="#org5c868f3">1.5. Data Exploration And Preparation</a></li>
<li><a href="#org1a53aca">1.6. <span class="todo TODO">TODO</span> Data Visualization Functionalities For Data Exploration And Inspection</a></li>
<li><a href="#org9160b2a">1.7. <span class="todo TODO">TODO</span> Implementation Of Interactive Queries</a></li>
<li><a href="#org5bb9e83">1.8. <span class="todo TODO">TODO</span> implementation of machine-learning methods and algorithms</a></li>
<li><a href="#org7791154">1.9. Security Features</a></li>
<li><a href="#org2118b36">1.10. <span class="todo TODO">TODO</span> Tools To Monitor And Maintain The Product</a></li>
<li><a href="#org3e7ea9b">1.11. <span class="todo TODO">TODO</span> A User-Friendly, Functional Dashboard That Includes At Least Three Visualization Types</a></li>
</ul>
</li>
<li><a href="#orge2d60f8">2. Documentation</a>
<ul>
<li><a href="#orgcc70df2">2.1. Business Vision</a></li>
<li><a href="#orgc216269">2.2. Data Sets</a></li>
<li><a href="#org4fd130a">2.3. Data Analysis</a></li>
<li><a href="#org33bec77">2.4. Assessment</a></li>
<li><a href="#orgae6aaf1">2.5. Visualizations</a></li>
<li><a href="#orgaad09e3">2.6. Accuracy</a></li>
<li><a href="#org248156b">2.7. Testing</a></li>
<li><a href="#org4c2c5cb">2.8. Source</a></li>
<li><a href="#org55bc8bd">2.9. Quick Start</a></li>
</ul>
</li>
<li><a href="#org3027af3">3. Notes</a></li>
</ul>
</div>
</div>
<div id="outline-container-org14a3a15" class="outline-2">
<h2 id="org14a3a15"><span class="section-number-2">1</span> RhymeStorm Capstone Requirements Documentation</h2>
<div class="outline-text-2" id="text-1">
<p>
RhymeStorm is an application to help singers and songwriters brainstorm new lyrics.
</p>
</div>
<div id="outline-container-org716e7d7" class="outline-3">
<h3 id="org716e7d7"><span class="section-number-3">1.1</span> Descriptive And Predictive Methods</h3>
<div class="outline-text-3" id="text-1-1">
</div>
<div id="outline-container-org12c93c0" class="outline-4">
<h4 id="org12c93c0"><span class="section-number-4">1.1.1</span> Descriptive Method</h4>
<div class="outline-text-4" id="text-1-1-1">
</div>
<ol class="org-ol">
<li><a id="org4d1af34"></a>Most Common Grammatical Structures In A Set Of Lyrics<br />
<div class="outline-text-5" id="text-1-1-1-1">
<p>
By filtering songs by metrics such as popularity, number of awards, etc&#x2026; we can use this software package to determine the most common grammatical phrase structure for different filtered categories.
</p>
<p>
Since much of the data a record label might want to categorize songs by is likely proprietary, filtering the songs by whatever metric is the responsibility of the user.
</p>
<p>
Once the songs are filtered/categorized, they can be passed to this software where a list of the most popular grammar structures will be returned.
</p>
<p>
In the example below, you&rsquo;ll see that a simple noun-phrase is the most popular structure with 6 occurrences, tied with a sentence composed of a prepositional-phrase, verb-phrase, and adjective.
</p>
<div class="org-src-container">
<pre class="src src-clojure"><span style="color: #51afef;">(</span>require '<span style="color: #c678dd;">[</span>com.owoga.corpus.markov <span style="color: #a9a1e1;">:as</span> markov<span style="color: #c678dd;">]</span>
'<span style="color: #c678dd;">[</span>com.owoga.prhyme.nlp.core <span style="color: #a9a1e1;">:as</span> nlp<span style="color: #c678dd;">]</span>
'<span style="color: #c678dd;">[</span>clojure.string <span style="color: #a9a1e1;">:as</span> string<span style="color: #c678dd;">]</span>
'<span style="color: #c678dd;">[</span>clojure.java.io <span style="color: #a9a1e1;">:as</span> io<span style="color: #c678dd;">]</span><span style="color: #51afef;">)</span>
<span style="color: #51afef;">(</span><span style="color: #51afef;">let</span> <span style="color: #c678dd;">[</span>lines <span style="color: #98be65;">(</span>transduce
<span style="color: #a9a1e1;">(</span>comp
<span style="color: #51afef;">(</span>map slurp<span style="color: #51afef;">)</span>
<span style="color: #51afef;">(</span>map #<span style="color: #c678dd;">(</span><span style="color: #ECBE7B;">string</span>/split <span style="color: #dcaeea;">%</span> #<span style="color: #98be65;">"</span><span style="color: #98be65; font-weight: bold;">\n</span><span style="color: #98be65;">"</span><span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span>
<span style="color: #51afef;">(</span>map <span style="color: #c678dd;">(</span>partial remove empty?<span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span>
<span style="color: #51afef;">(</span>map <span style="color: #ECBE7B;">nlp</span>/structure-freqs<span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span>
merge
<span style="color: #a9a1e1;">{}</span>
<span style="color: #a9a1e1;">(</span>eduction <span style="color: #51afef;">(</span><span style="color: #ECBE7B;">markov</span>/xf-file-seq <span style="color: #da8548; font-weight: bold;">0</span> <span style="color: #da8548; font-weight: bold;">10</span><span style="color: #51afef;">)</span> <span style="color: #51afef;">(</span>file-seq <span style="color: #c678dd;">(</span><span style="color: #ECBE7B;">io</span>/file <span style="color: #98be65;">"/home/eihli/src/prhyme/dark-corpus"</span><span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span><span style="color: #c678dd;">]</span>
<span style="color: #c678dd;">(</span>take <span style="color: #da8548; font-weight: bold;">5</span> <span style="color: #98be65;">(</span>sort-by <span style="color: #a9a1e1;">(</span>comp - second<span style="color: #a9a1e1;">)</span> lines<span style="color: #98be65;">)</span><span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span>
</pre>
</div>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="org-left" />
<col class="org-right" />
</colgroup>
<tbody>
<tr>
<td class="org-left">(TOP (NP (NNP) (.)))</td>
<td class="org-right">6</td>
</tr>
<tr>
<td class="org-left">(TOP (S (NP (PRP)) (VP (VBP) (ADJP (JJ))) (.)))</td>
<td class="org-right">6</td>
</tr>
<tr>
<td class="org-left">(INC (NP (JJ) (NN)) nil (IN) (NP (DT)) (NP (PRP)) (VBP))</td>
<td class="org-right">4</td>
</tr>
<tr>
<td class="org-left">(TOP (NP (NP (JJ) (NN)) nil (NP (NN) (CC) (NN))))</td>
<td class="org-right">4</td>
</tr>
<tr>
<td class="org-left">(TOP (S (NP (JJ) (NN)) nil (VP (VBG) (ADJP (JJ)))))</td>
<td class="org-right">4</td>
</tr>
</tbody>
</table>
</div>
</li>
</ol>
</div>
<div id="outline-container-org6c4e064" class="outline-4">
<h4 id="org6c4e064"><span class="section-number-4">1.1.2</span> Prescriptive Method</h4>
<div class="outline-text-4" id="text-1-1-2">
</div>
<ol class="org-ol">
<li><a id="org385543b"></a>Most Likely Word To Follow A Given Phrase<br />
<div class="outline-text-5" id="text-1-1-2-1">
<p>
To help songwriters think of new lyrics, we provide an API to receive a list of words that commonly follow/precede a given phrase.
</p>
<p>
Models can be trained on different genres or categories of songs. This will ensure that recommended lyric completions are apt.
</p>
<p>
In the example below, we provide a seed suffix of &ldquo;bother me&rdquo; and ask the software to predict the most likely words that precede that phrase. The resulting most popular phrases are &ldquo;don&rsquo;t bother me&rdquo;, &ldquo;doesn&rsquo;t bother me&rdquo;, &ldquo;to bother me&rdquo;, &ldquo;won&rsquo;t bother me&rdquo;, etc&#x2026;
</p>
<div class="org-src-container">
<pre class="src src-clojure"><span style="color: #51afef;">(</span>require '<span style="color: #c678dd;">[</span>com.darklimericks.server.models <span style="color: #a9a1e1;">:as</span> models<span style="color: #c678dd;">]</span>
'<span style="color: #c678dd;">[</span>com.owoga.trie <span style="color: #a9a1e1;">:as</span> trie<span style="color: #c678dd;">]</span><span style="color: #51afef;">)</span>
<span style="color: #51afef;">(</span><span style="color: #51afef;">let</span> <span style="color: #c678dd;">[</span>seed <span style="color: #98be65;">[</span><span style="color: #98be65;">"bother"</span> <span style="color: #98be65;">"me"</span><span style="color: #98be65;">]</span>
seed-ids <span style="color: #98be65;">(</span>map <span style="color: #ECBE7B;">models</span>/database seed<span style="color: #98be65;">)</span>
lookup <span style="color: #98be65;">(</span>reverse seed-ids<span style="color: #98be65;">)</span>
results <span style="color: #98be65;">(</span><span style="color: #ECBE7B;">trie</span>/children <span style="color: #a9a1e1;">(</span><span style="color: #ECBE7B;">trie</span>/lookup <span style="color: #ECBE7B;">models</span>/markov-trie lookup<span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span><span style="color: #c678dd;">]</span>
<span style="color: #c678dd;">(</span><span style="color: #51afef;">-&gt;&gt;</span> results
<span style="color: #98be65;">(</span>map #<span style="color: #a9a1e1;">(</span>get <span style="color: #dcaeea;">%</span> <span style="color: #51afef;">[]</span><span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span>
<span style="color: #98be65;">(</span>sort-by <span style="color: #a9a1e1;">(</span>comp - second<span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span>
<span style="color: #98be65;">(</span>map #<span style="color: #a9a1e1;">(</span>update <span style="color: #dcaeea;">%</span> <span style="color: #da8548; font-weight: bold;">0</span> <span style="color: #ECBE7B;">models</span>/database<span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span>
<span style="color: #98be65;">(</span>take <span style="color: #da8548; font-weight: bold;">10</span><span style="color: #98be65;">)</span><span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span>
</pre>
</div>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="org-left" />
<col class="org-right" />
</colgroup>
<tbody>
<tr>
<td class="org-left">don&rsquo;t</td>
<td class="org-right">36</td>
</tr>
<tr>
<td class="org-left">doesn&rsquo;t</td>
<td class="org-right">21</td>
</tr>
<tr>
<td class="org-left">to</td>
<td class="org-right">14</td>
</tr>
<tr>
<td class="org-left">won&rsquo;t</td>
<td class="org-right">9</td>
</tr>
<tr>
<td class="org-left">really</td>
<td class="org-right">5</td>
</tr>
<tr>
<td class="org-left">not</td>
<td class="org-right">4</td>
</tr>
<tr>
<td class="org-left">you</td>
<td class="org-right">4</td>
</tr>
<tr>
<td class="org-left">it</td>
<td class="org-right">3</td>
</tr>
<tr>
<td class="org-left">even</td>
<td class="org-right">3</td>
</tr>
<tr>
<td class="org-left">shouldn&rsquo;t</td>
<td class="org-right">3</td>
</tr>
</tbody>
</table>
</div>
</li>
</ol>
</div>
</div>
<div id="outline-container-org1ccfefb" class="outline-3">
<h3 id="org1ccfefb"><span class="section-number-3">1.2</span> Datasets</h3>
<div class="outline-text-3" id="text-1-2">
<p>
The dataset currently in use is in <code>/dark-corpus</code>. This dataset was generated from the publicly available lyrics at <a href="http://darklyrics.com">http://darklyrics.com</a>.
</p>
<p>
Further datasets will need to be provided by the end-user.
</p>
</div>
</div>
<div id="outline-container-org8b56e0a" class="outline-3">
<h3 id="org8b56e0a"><span class="section-number-3">1.3</span> Decision Support Functionality</h3>
<div class="outline-text-3" id="text-1-3">
</div>
<div id="outline-container-org19a500c" class="outline-4">
<h4 id="org19a500c"><span class="section-number-4">1.3.1</span> Choosing Words For A Lyric Based On Markov Likelihood</h4>
<div class="outline-text-4" id="text-1-3-1">
<p>
Entire phrases can be generated using the previously mentioned functionality of generating lists of likely prefix/suffix words.
</p>
<p>
The software can be seeded with a simple &ldquo;end-of-sentence&rdquo; or &ldquo;beginning-of-sentence&rdquo; token and can be asked to work backwards to build a phrase that meets certain criteria.
</p>
<p>
The user can supply criteria such as restrictions on the number of syllables, number of words, rhyme scheme, etc&#x2026;
</p>
</div>
</div>
<div id="outline-container-orge4b7a97" class="outline-4">
<h4 id="orge4b7a97"><span class="section-number-4">1.3.2</span> Choosing Words To Complete A Lyric Based On Rhyme Quality</h4>
<div class="outline-text-4" id="text-1-3-2">
<p>
Another part of the decision support functionality is filtering and ordering predicted words based on their rhyme quality.
</p>
<p>
The official definition of a &ldquo;perfect&rdquo; rhyme is when two words have matching phonemes starting from their primary stress.
</p>
<p>
For example: technology and ecology. Both of those words have a stress on the second syllable. The first syllables differ. But from the stressed syllable on, they have exactly matching phones.
</p>
<p>
A rhyme that might be useful to a songwriter but that doesn&rsquo;t fit the definition of a &ldquo;perfect&rdquo; rhyme would be &ldquo;technology&rdquo; and &ldquo;economy&rdquo;. Those two words just barely break the rules for a perfect rhyme. Their vowel phones match from their primary stress to their ends. But one of the consonant phones doesn&rsquo;t match.
</p>
<p>
Singers and songwriters have some flexibility and artistic freedom and imperfect rhymes can be a fallback.
</p>
<p>
Therefore, this software provides functionality to sort rhymes so that rhymes that are closer to perfect are first in the ordering.
</p>
<p>
In the example below, you&rsquo;ll see that the first 20 or so rhymes are perfect, but then &ldquo;hypocrisy&rdquo; is listed as rhyming with &ldquo;technology&rdquo;. This is for the reason just mentioned. It&rsquo;s close to a perfect rhyme and it&rsquo;s of interest to singers/songwriters.
</p>
<div class="org-src-container">
<pre class="src src-clojure"><span style="color: #51afef;">(</span>require '<span style="color: #c678dd;">[</span>com.darklimericks.linguistics.core <span style="color: #a9a1e1;">:as</span> linguistics<span style="color: #c678dd;">]</span><span style="color: #51afef;">)</span>
<span style="color: #51afef;">(</span><span style="color: #51afef;">let</span> <span style="color: #c678dd;">[</span>results
<span style="color: #98be65;">(</span><span style="color: #ECBE7B;">linguistics</span>/rhymes-with-frequencies-and-rhyme-quality
<span style="color: #98be65;">"technology"</span>
<span style="color: #ECBE7B;">models</span>/markov-trie
<span style="color: #ECBE7B;">models</span>/database<span style="color: #98be65;">)</span><span style="color: #c678dd;">]</span>
<span style="color: #c678dd;">(</span><span style="color: #51afef;">-&gt;&gt;</span> results
<span style="color: #98be65;">(</span>map
<span style="color: #a9a1e1;">(</span><span style="color: #51afef;">fn</span> <span style="color: #51afef;">[</span><span style="color: #c678dd;">[</span>rhyming-word
rhyming-word-phones
frequency-count-of-rhyming-word
target-word
target-word-phones
rhyme-quality<span style="color: #c678dd;">]</span><span style="color: #51afef;">]</span>
<span style="color: #51afef;">[</span>rhyming-word frequency-count-of-rhyming-word rhyme-quality<span style="color: #51afef;">]</span><span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span>
<span style="color: #98be65;">(</span>take <span style="color: #da8548; font-weight: bold;">25</span><span style="color: #98be65;">)</span>
<span style="color: #98be65;">(</span>vec<span style="color: #98be65;">)</span>
<span style="color: #98be65;">(</span>into <span style="color: #a9a1e1;">[</span><span style="color: #51afef;">[</span><span style="color: #98be65;">"rhyme"</span> <span style="color: #98be65;">"frequency count"</span> <span style="color: #98be65;">"rhyme quality"</span><span style="color: #51afef;">]</span><span style="color: #a9a1e1;">]</span><span style="color: #98be65;">)</span><span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span>
</pre>
</div>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="org-left" />
</colgroup>
<tbody>
<tr>
<td class="org-left">class java.lang.IllegalArgumentException</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<div id="outline-container-orgf13759f" class="outline-3">
<h3 id="orgf13759f"><span class="section-number-3">1.4</span> Featurizing, Parsing, Cleaning, And Wrangling Data</h3>
<div class="outline-text-3" id="text-1-4">
<p>
The data processing code is in <code>prhyme</code>
</p>
<p>
Each line gets tokenized using a regular expression to split the string into tokens.
</p>
<div class="org-src-container">
<pre class="src src-clojure"><span style="color: #51afef;">(</span><span style="color: #51afef;">def</span> <span style="color: #dcaeea;">re-word</span>
<span style="color: #83898d;">"Regex for tokenizing a string into words</span>
<span style="color: #83898d;"> (including contractions and hyphenations),</span>
<span style="color: #83898d;"> commas, periods, and newlines."</span>
#<span style="color: #98be65;">"</span><span style="color: #51afef; font-weight: bold;">(</span><span style="color: #98be65;">?s</span><span style="color: #51afef; font-weight: bold;">)</span><span style="color: #98be65;">.*?</span><span style="color: #51afef; font-weight: bold;">(</span><span style="color: #98be65;">[a-zA-Z</span><span style="color: #98be65; font-weight: bold;">\d</span><span style="color: #98be65;">]+</span><span style="color: #51afef; font-weight: bold;">(?:</span><span style="color: #98be65;">['</span><span style="color: #98be65; font-weight: bold;">\-</span><span style="color: #98be65;">]?[a-zA-Z]+</span><span style="color: #51afef; font-weight: bold;">)</span><span style="color: #98be65;">?</span><span style="color: #51afef; font-weight: bold;">|</span><span style="color: #98be65;">,</span><span style="color: #51afef; font-weight: bold;">|</span><span style="color: #98be65; font-weight: bold;">\.</span><span style="color: #51afef; font-weight: bold;">|</span><span style="color: #98be65; font-weight: bold;">\?</span><span style="color: #51afef; font-weight: bold;">|</span><span style="color: #98be65; font-weight: bold;">\n</span><span style="color: #51afef; font-weight: bold;">)</span><span style="color: #98be65;">"</span><span style="color: #51afef;">)</span>
</pre>
</div>
<p>
Along with tokenization, the lines get stripped of whitespace and converted to lowercase. This conversion is done so that
words can be compared: &ldquo;Foo&rdquo; is the same as &ldquo;foo&rdquo;.
</p>
<div class="org-src-container">
<pre class="src src-clojure"><span style="color: #51afef;">(</span><span style="color: #51afef;">def</span> <span style="color: #dcaeea;">xf-tokenize</span>
<span style="color: #c678dd;">(</span>comp
<span style="color: #98be65;">(</span>map <span style="color: #ECBE7B;">string</span>/trim<span style="color: #98be65;">)</span>
<span style="color: #98be65;">(</span>map <span style="color: #a9a1e1;">(</span>partial re-seq re-word<span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span>
<span style="color: #98be65;">(</span>map <span style="color: #a9a1e1;">(</span>partial map second<span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span>
<span style="color: #98be65;">(</span>map <span style="color: #a9a1e1;">(</span>partial mapv <span style="color: #ECBE7B;">string</span>/lower-case<span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span><span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span>
</pre>
</div>
</div>
</div>
<div id="outline-container-org5c868f3" class="outline-3">
<h3 id="org5c868f3"><span class="section-number-3">1.5</span> Data Exploration And Preparation</h3>
<div class="outline-text-3" id="text-1-5">
<p>
The primary data structure and algorithms supporting exploration of the data are a Markov Trie
</p>
<p>
The Trie data structure supports a <code>lookup</code> function that returns the child trie at a certain lookup key and a <code>children</code> function that returns all of the immediate children of a particular Trie.
</p>
<div class="org-src-container">
<pre class="src src-clojure"><span style="color: #51afef;">(</span><span style="color: #51afef;">defprotocol</span> <span style="color: #ECBE7B;">ITrie</span>
<span style="color: #c678dd;">(</span>children <span style="color: #98be65;">[</span>self<span style="color: #98be65;">]</span> <span style="color: #98be65;">"Immediate children of a node."</span><span style="color: #c678dd;">)</span>
<span style="color: #c678dd;">(</span>lookup <span style="color: #98be65;">[</span>self <span style="color: #bbc2cf; background-color: #21242b;">^</span><span style="color: #ECBE7B;">clojure.lang.PersistentList</span> ks<span style="color: #98be65;">]</span> <span style="color: #98be65;">"Return node at key."</span><span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span>
<span style="color: #51afef;">(</span><span style="color: #51afef;">deftype</span> <span style="color: #ECBE7B;">Trie</span> <span style="color: #c678dd;">[</span>key value <span style="color: #bbc2cf; background-color: #21242b;">^</span><span style="color: #ECBE7B;">clojure.lang.PersistentTreeMap</span> children-<span style="color: #c678dd;">]</span>
ITrie
<span style="color: #c678dd;">(</span>children <span style="color: #98be65;">[</span>trie<span style="color: #98be65;">]</span>
<span style="color: #98be65;">(</span>map
<span style="color: #a9a1e1;">(</span><span style="color: #51afef;">fn</span> <span style="color: #51afef;">[</span><span style="color: #c678dd;">[</span>k <span style="color: #bbc2cf; background-color: #21242b;">^</span><span style="color: #ECBE7B;">Trie</span> child<span style="color: #c678dd;">]</span><span style="color: #51afef;">]</span>
<span style="color: #51afef;">(</span>Trie. k
<span style="color: #c678dd;">(</span>.value child<span style="color: #c678dd;">)</span>
<span style="color: #c678dd;">(</span>.children- child<span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span>
children-<span style="color: #98be65;">)</span><span style="color: #c678dd;">)</span>
<span style="color: #c678dd;">(</span>lookup <span style="color: #98be65;">[</span>trie k<span style="color: #98be65;">]</span>
<span style="color: #98be65;">(</span><span style="color: #51afef;">loop</span> <span style="color: #a9a1e1;">[</span>k k
trie trie<span style="color: #a9a1e1;">]</span>
<span style="color: #a9a1e1;">(</span><span style="color: #51afef;">cond</span>
<span style="color: #5B6268;">;; </span><span style="color: #5B6268;">Allows `</span><span style="color: #a9a1e1;">update</span><span style="color: #5B6268;">` to work the same as with maps... can use `</span><span style="color: #a9a1e1;">fnil</span><span style="color: #5B6268;">`.</span>
<span style="color: #5B6268;">;; </span><span style="color: #5B6268;">(nil? trie') (throw (Exception. (format "Key not found: %s" k)))</span>
<span style="color: #51afef;">(</span>nil? trie<span style="color: #51afef;">)</span> <span style="color: #a9a1e1;">nil</span>
<span style="color: #51afef;">(</span>empty? k<span style="color: #51afef;">)</span>
<span style="color: #51afef;">(</span>Trie. <span style="color: #c678dd;">(</span>.key trie<span style="color: #c678dd;">)</span>
<span style="color: #c678dd;">(</span>.value trie<span style="color: #c678dd;">)</span>
<span style="color: #c678dd;">(</span>.children- trie<span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span>
<span style="color: #a9a1e1;">:else</span> <span style="color: #51afef;">(</span><span style="color: #51afef;">recur</span>
<span style="color: #c678dd;">(</span>rest k<span style="color: #c678dd;">)</span>
<span style="color: #c678dd;">(</span>get <span style="color: #98be65;">(</span>.children- trie<span style="color: #98be65;">)</span> <span style="color: #98be65;">(</span>first k<span style="color: #98be65;">)</span><span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span><span style="color: #c678dd;">)</span>
</pre>
</div>
</div>
</div>
<div id="outline-container-org1a53aca" class="outline-3">
<h3 id="org1a53aca"><span class="section-number-3">1.6</span> <span class="todo TODO">TODO</span> Data Visualization Functionalities For Data Exploration And Inspection</h3>
<div class="outline-text-3" id="text-1-6">
<ul class="org-ul">
<li>graph of phrase complexity on one axis and rhyme quality on another axis.</li>
</ul>
</div>
</div>
<div id="outline-container-org9160b2a" class="outline-3">
<h3 id="org9160b2a"><span class="section-number-3">1.7</span> <span class="todo TODO">TODO</span> Implementation Of Interactive Queries</h3>
<div class="outline-text-3" id="text-1-7">
<p>
Interactive query capability at <a href="https://darklimericks.com/wgu">https://darklimericks.com/wgu</a>.
</p>
</div>
</div>
<div id="outline-container-org5bb9e83" class="outline-3">
<h3 id="org5bb9e83"><span class="section-number-3">1.8</span> <span class="todo TODO">TODO</span> implementation of machine-learning methods and algorithms</h3>
<div class="outline-text-3" id="text-1-8">
<p>
The machine learning method chosen for this software is a Hidden Markov Model.
</p>
<p>
Each line of each song is split into &ldquo;tokens&rdquo; (words) and then the previous <code>n - 1</code> tokens are used to predict the <code>nth</code> token.
</p>
<p>
The algorithm is implemented in several parts which are demonstrated below.
</p>
<ol class="org-ol">
<li>Read each song line-by-line.</li>
<li>Split each line into tokens.</li>
<li>Partition the tokens into sequences of length <code>n</code>.</li>
<li>Associate each sequence into a Trie and update the value representing the number of times that sequence has been encountered.</li>
</ol>
<p>
That is the process for building the Hidden Markov Model.
</p>
<p>
The algorithm for generating predictions from the HMM is as follows.
</p>
<ol class="org-ol">
<li>Look up the <code>n - 1</code> tokens in the Trie.</li>
<li>Normalize the frequencies of the children of the <code>n - 1</code> tokens into percentage likelihoods.</li>
<li>Account for &ldquo;unseen <code>n grams</code>&rdquo; (Simple Good Turing).</li>
<li>Sort results by maximum likelihood.</li>
</ol>
<div class="org-src-container">
<pre class="src src-clojure"><span style="color: #51afef;">(</span>require '<span style="color: #c678dd;">[</span>com.owoga.prhyme.data-transform <span style="color: #a9a1e1;">:as</span> data-transform<span style="color: #c678dd;">]</span>
'<span style="color: #c678dd;">[</span>clojure.pprint <span style="color: #a9a1e1;">:as</span> pprint<span style="color: #c678dd;">]</span><span style="color: #51afef;">)</span>
<span style="color: #51afef;">(</span><span style="color: #51afef;">defn</span> <span style="color: #c678dd;">file-seq-&gt;markov-trie</span>
<span style="color: #83898d;">"For forwards markov."</span>
<span style="color: #c678dd;">[</span>database files n m<span style="color: #c678dd;">]</span>
<span style="color: #c678dd;">(</span>transduce
<span style="color: #98be65;">(</span>comp
<span style="color: #a9a1e1;">(</span>map slurp<span style="color: #a9a1e1;">)</span>
<span style="color: #a9a1e1;">(</span>map #<span style="color: #51afef;">(</span><span style="color: #ECBE7B;">string</span>/split <span style="color: #dcaeea;">%</span> #<span style="color: #98be65;">"[</span><span style="color: #98be65; font-weight: bold;">\n</span><span style="color: #98be65;">+</span><span style="color: #98be65; font-weight: bold;">\?\.</span><span style="color: #98be65;">]"</span><span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span>
<span style="color: #a9a1e1;">(</span>map <span style="color: #51afef;">(</span>partial transduce <span style="color: #ECBE7B;">data-transform</span>/xf-tokenize conj<span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span>
<span style="color: #a9a1e1;">(</span>map <span style="color: #51afef;">(</span>partial transduce <span style="color: #ECBE7B;">data-transform</span>/xf-filter-english conj<span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span>
<span style="color: #a9a1e1;">(</span>map <span style="color: #51afef;">(</span>partial remove empty?<span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span>
<span style="color: #a9a1e1;">(</span>map <span style="color: #51afef;">(</span>partial into <span style="color: #c678dd;">[]</span> <span style="color: #c678dd;">(</span><span style="color: #ECBE7B;">data-transform</span>/xf-pad-tokens <span style="color: #98be65;">(</span>dec m<span style="color: #98be65;">)</span> <span style="color: #98be65;">"&lt;s&gt;"</span> <span style="color: #da8548; font-weight: bold;">1</span> <span style="color: #98be65;">"&lt;/s&gt;"</span><span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span>
<span style="color: #a9a1e1;">(</span>map <span style="color: #51afef;">(</span>partial mapcat <span style="color: #c678dd;">(</span>partial <span style="color: #ECBE7B;">data-transform</span>/n-to-m-partitions n <span style="color: #98be65;">(</span>inc m<span style="color: #98be65;">)</span><span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span>
<span style="color: #a9a1e1;">(</span>mapcat <span style="color: #51afef;">(</span>partial mapv <span style="color: #c678dd;">(</span><span style="color: #ECBE7B;">data-transform</span>/make-database-processor database<span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span>
<span style="color: #98be65;">(</span>completing
<span style="color: #a9a1e1;">(</span><span style="color: #51afef;">fn</span> <span style="color: #51afef;">[</span>trie lookup<span style="color: #51afef;">]</span>
<span style="color: #51afef;">(</span>update trie lookup <span style="color: #c678dd;">(</span>fnil #<span style="color: #98be65;">(</span>update <span style="color: #dcaeea;">%</span> <span style="color: #da8548; font-weight: bold;">1</span> inc<span style="color: #98be65;">)</span> <span style="color: #98be65;">[</span>lookup <span style="color: #da8548; font-weight: bold;">0</span><span style="color: #98be65;">]</span><span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span>
<span style="color: #98be65;">(</span><span style="color: #ECBE7B;">trie</span>/make-trie<span style="color: #98be65;">)</span>
files<span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span>
<span style="color: #51afef;">(</span><span style="color: #51afef;">let</span> <span style="color: #c678dd;">[</span>files <span style="color: #98be65;">(</span><span style="color: #51afef;">-&gt;&gt;</span> <span style="color: #98be65;">"/home/eihli/src/prhyme/dark-corpus"</span>
<span style="color: #ECBE7B;">io</span>/file
file-seq
<span style="color: #a9a1e1;">(</span>eduction <span style="color: #51afef;">(</span><span style="color: #ECBE7B;">data-transform</span>/xf-file-seq <span style="color: #da8548; font-weight: bold;">501</span> <span style="color: #da8548; font-weight: bold;">2</span><span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span>
database <span style="color: #98be65;">(</span>atom <span style="color: #a9a1e1;">{</span><span style="color: #a9a1e1;">:next-id</span> <span style="color: #da8548; font-weight: bold;">1</span><span style="color: #a9a1e1;">}</span><span style="color: #98be65;">)</span>
trie <span style="color: #98be65;">(</span>file-seq-&gt;markov-trie database files <span style="color: #da8548; font-weight: bold;">1</span> <span style="color: #da8548; font-weight: bold;">3</span><span style="color: #98be65;">)</span><span style="color: #c678dd;">]</span>
<span style="color: #c678dd;">(</span><span style="color: #ECBE7B;">pprint</span>/pprint <span style="color: #98be65;">[</span><span style="color: #a9a1e1;">(</span>map <span style="color: #51afef;">(</span>comp <span style="color: #c678dd;">(</span>partial map @database<span style="color: #c678dd;">)</span> first<span style="color: #51afef;">)</span> <span style="color: #51afef;">(</span>take <span style="color: #da8548; font-weight: bold;">10</span> <span style="color: #c678dd;">(</span>drop <span style="color: #da8548; font-weight: bold;">105</span> trie<span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span><span style="color: #98be65;">]</span><span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span>
</pre>
</div>
<pre class="example" id="org9d447fd">
[(("&lt;s&gt;" "call" "me")
("&lt;s&gt;" "call")
("&lt;s&gt;" "right" "&lt;/s&gt;")
("&lt;s&gt;" "right")
("&lt;s&gt;" "that's" "proportional")
("&lt;s&gt;" "that's")
("&lt;s&gt;" "don't" "&lt;/s&gt;")
("&lt;s&gt;" "don't")
("&lt;s&gt;" "yourself" "in")
("&lt;s&gt;" "yourself"))]
</pre>
<p>
The results above show a sample of 10 elements in a 1-to-3-gram trie
</p>
<p>
The code sample below demonstrates training a Hidden Markov Model on a set of lyrics where each line gets reversed. This model is useful for predicting words backwards, so that you can start with the rhyming end of a word or phrase and generate backwards to the start of the lyric.
</p>
<p>
It also performs compaction and serialization. Song lyrics are typically provided as text files. Reading files on a hard drive is an expensive process, but we can perform that expensive training process only once and save the resulting Markov Model in a more memory-efficient format.
</p>
<div class="org-src-container">
<pre class="src src-clojure"><span style="color: #51afef;">(</span><span style="color: #51afef;">defn</span> <span style="color: #c678dd;">train-backwards</span>
<span style="color: #83898d;">"For building lines backwards so they can be seeded with a target rhyme."</span>
<span style="color: #c678dd;">[</span>files n m trie-filepath database-filepath tightly-packed-trie-filepath<span style="color: #c678dd;">]</span>
<span style="color: #c678dd;">(</span><span style="color: #51afef;">let</span> <span style="color: #98be65;">[</span>database <span style="color: #a9a1e1;">(</span>atom <span style="color: #51afef;">{</span><span style="color: #a9a1e1;">:next-id</span> <span style="color: #da8548; font-weight: bold;">1</span><span style="color: #51afef;">}</span><span style="color: #a9a1e1;">)</span>
trie <span style="color: #a9a1e1;">(</span>file-seq-&gt;backwards-markov-trie database files n m<span style="color: #a9a1e1;">)</span><span style="color: #98be65;">]</span>
<span style="color: #98be65;">(</span><span style="color: #ECBE7B;">nippy</span>/freeze-to-file trie-filepath <span style="color: #a9a1e1;">(</span>seq trie<span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span>
<span style="color: #98be65;">(</span>println <span style="color: #98be65;">"Froze"</span> trie-filepath<span style="color: #98be65;">)</span>
<span style="color: #98be65;">(</span><span style="color: #ECBE7B;">nippy</span>/freeze-to-file database-filepath @database<span style="color: #98be65;">)</span>
<span style="color: #98be65;">(</span>println <span style="color: #98be65;">"Froze"</span> database-filepath<span style="color: #98be65;">)</span>
<span style="color: #98be65;">(</span>save-tightly-packed-trie trie database tightly-packed-trie-filepath<span style="color: #98be65;">)</span>
<span style="color: #98be65;">(</span><span style="color: #51afef;">let</span> <span style="color: #a9a1e1;">[</span>loaded-trie <span style="color: #51afef;">(</span><span style="color: #51afef;">-&gt;&gt;</span> trie-filepath
<span style="color: #ECBE7B;">nippy</span>/thaw-from-file
<span style="color: #c678dd;">(</span>into <span style="color: #98be65;">(</span><span style="color: #ECBE7B;">trie</span>/make-trie<span style="color: #98be65;">)</span><span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span>
loaded-db <span style="color: #51afef;">(</span><span style="color: #51afef;">-&gt;&gt;</span> database-filepath
<span style="color: #ECBE7B;">nippy</span>/thaw-from-file<span style="color: #51afef;">)</span>
loaded-tightly-packed-trie <span style="color: #51afef;">(</span><span style="color: #ECBE7B;">tpt</span>/load-tightly-packed-trie-from-file
tightly-packed-trie-filepath
<span style="color: #c678dd;">(</span>decode-fn loaded-db<span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span><span style="color: #a9a1e1;">]</span>
<span style="color: #a9a1e1;">(</span>println <span style="color: #98be65;">"Loaded trie:"</span> <span style="color: #51afef;">(</span>take <span style="color: #da8548; font-weight: bold;">5</span> loaded-trie<span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span>
<span style="color: #a9a1e1;">(</span>println <span style="color: #98be65;">"Loaded database:"</span> <span style="color: #51afef;">(</span>take <span style="color: #da8548; font-weight: bold;">5</span> loaded-db<span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span>
<span style="color: #a9a1e1;">(</span>println <span style="color: #98be65;">"Loaded tightly-packed-trie:"</span> <span style="color: #51afef;">(</span>take <span style="color: #da8548; font-weight: bold;">5</span> loaded-tightly-packed-trie<span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span>
<span style="color: #a9a1e1;">(</span>println <span style="color: #98be65;">"Successfully loaded trie and database."</span><span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span><span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span>
<span style="color: #51afef;">(</span>comment
<span style="color: #c678dd;">(</span>time
<span style="color: #98be65;">(</span><span style="color: #51afef;">let</span> <span style="color: #a9a1e1;">[</span>files <span style="color: #51afef;">(</span><span style="color: #51afef;">-&gt;&gt;</span> <span style="color: #98be65;">"dark-corpus"</span>
<span style="color: #ECBE7B;">io</span>/file
file-seq
<span style="color: #c678dd;">(</span>eduction <span style="color: #98be65;">(</span>xf-file-seq <span style="color: #da8548; font-weight: bold;">0</span> <span style="color: #da8548; font-weight: bold;">250000</span><span style="color: #98be65;">)</span><span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span>
<span style="color: #51afef;">[</span>trie database<span style="color: #51afef;">]</span> <span style="color: #51afef;">(</span>train-backwards
files
<span style="color: #da8548; font-weight: bold;">1</span>
<span style="color: #da8548; font-weight: bold;">5</span>
<span style="color: #98be65;">"/home/eihli/.models/markov-trie-4-gram-backwards.bin"</span>
<span style="color: #98be65;">"/home/eihli/.models/markov-database-4-gram-backwards.bin"</span>
<span style="color: #98be65;">"/home/eihli/.models/markov-tightly-packed-trie-4-gram-backwards.bin"</span><span style="color: #51afef;">)</span><span style="color: #a9a1e1;">]</span><span style="color: #98be65;">)</span><span style="color: #c678dd;">)</span>
<span style="color: #c678dd;">(</span>time
<span style="color: #98be65;">(</span><span style="color: #51afef;">def</span> <span style="color: #dcaeea;">markov-trie</span> <span style="color: #a9a1e1;">(</span>into <span style="color: #51afef;">(</span><span style="color: #ECBE7B;">trie</span>/make-trie<span style="color: #51afef;">)</span> <span style="color: #51afef;">(</span><span style="color: #ECBE7B;">nippy</span>/thaw-from-file <span style="color: #98be65;">"/home/eihli/.models/markov-trie-4-gram-backwards.bin"</span><span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span><span style="color: #c678dd;">)</span>
<span style="color: #c678dd;">(</span>time
<span style="color: #98be65;">(</span><span style="color: #51afef;">def</span> <span style="color: #dcaeea;">database</span> <span style="color: #a9a1e1;">(</span><span style="color: #ECBE7B;">nippy</span>/thaw-from-file <span style="color: #98be65;">"/home/eihli/.models/markov-database-4-gram-backwards.bin"</span><span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span><span style="color: #c678dd;">)</span>
<span style="color: #c678dd;">(</span>time
<span style="color: #98be65;">(</span><span style="color: #51afef;">def</span> <span style="color: #dcaeea;">markov-tight-trie</span>
<span style="color: #a9a1e1;">(</span><span style="color: #ECBE7B;">tpt</span>/load-tightly-packed-trie-from-file
<span style="color: #98be65;">"/home/eihli/.models/markov-tightly-packed-trie-4-gram-backwards.bin"</span>
<span style="color: #51afef;">(</span>decode-fn database<span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span><span style="color: #c678dd;">)</span>
<span style="color: #c678dd;">(</span>take <span style="color: #da8548; font-weight: bold;">20</span> markov-tight-trie<span style="color: #c678dd;">)</span>
<span style="color: #51afef;">)</span>
</pre>
</div>
<p>
Functionalities To Evaluate The Accuracy Of The Data Product
</p>
<p>
Since creative brainstorming is the goal, &ldquo;accuracy&rdquo; is subjective.
</p>
<p>
We can, however, measure and compare language generation algorithms against how &ldquo;expected&rdquo; a phrase is given the training data. This measurement is &ldquo;perplexity&rdquo;.
</p>
<div class="org-src-container">
<pre class="src src-clojure"><span style="color: #51afef;">(</span>require '<span style="color: #c678dd;">[</span>taoensso.nippy <span style="color: #a9a1e1;">:as</span> nippy<span style="color: #c678dd;">]</span>
'<span style="color: #c678dd;">[</span>com.owoga.tightly-packed-trie <span style="color: #a9a1e1;">:as</span> tpt<span style="color: #c678dd;">]</span>
'<span style="color: #c678dd;">[</span>com.owoga.corpus.markov <span style="color: #a9a1e1;">:as</span> markov<span style="color: #c678dd;">]</span><span style="color: #51afef;">)</span>
<span style="color: #51afef;">(</span><span style="color: #51afef;">defonce</span> <span style="color: #dcaeea;">database</span> <span style="color: #c678dd;">(</span><span style="color: #ECBE7B;">nippy</span>/thaw-from-file <span style="color: #98be65;">"/home/eihli/.models/markov-database-4-gram-backwards.bin"</span><span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span>
<span style="color: #51afef;">(</span><span style="color: #51afef;">defonce</span> <span style="color: #dcaeea;">markov-tight-trie</span>
<span style="color: #c678dd;">(</span><span style="color: #ECBE7B;">tpt</span>/load-tightly-packed-trie-from-file
<span style="color: #98be65;">"/home/eihli/.models/markov-tightly-packed-trie-4-gram-backwards.bin"</span>
<span style="color: #98be65;">(</span><span style="color: #ECBE7B;">markov</span>/decode-fn database<span style="color: #98be65;">)</span><span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span>
<span style="color: #51afef;">(</span><span style="color: #51afef;">let</span> <span style="color: #c678dd;">[</span>likely-phrase <span style="color: #98be65;">[</span><span style="color: #98be65;">"a"</span> <span style="color: #98be65;">"hole"</span> <span style="color: #98be65;">"&lt;/s&gt;"</span> <span style="color: #98be65;">"&lt;/s&gt;"</span><span style="color: #98be65;">]</span>
less-likely-phrase <span style="color: #98be65;">[</span><span style="color: #98be65;">"this"</span> <span style="color: #98be65;">"hole"</span> <span style="color: #98be65;">"&lt;/s&gt;"</span> <span style="color: #98be65;">"&lt;/s&gt;"</span><span style="color: #98be65;">]</span>
least-likely-phrase <span style="color: #98be65;">[</span><span style="color: #98be65;">"that"</span> <span style="color: #98be65;">"hole"</span> <span style="color: #98be65;">"&lt;/s&gt;"</span> <span style="color: #98be65;">"&lt;/s&gt;"</span><span style="color: #98be65;">]</span><span style="color: #c678dd;">]</span>
<span style="color: #c678dd;">(</span>run!
<span style="color: #98be65;">(</span><span style="color: #51afef;">fn</span> <span style="color: #a9a1e1;">[</span>word<span style="color: #a9a1e1;">]</span>
<span style="color: #a9a1e1;">(</span>println
<span style="color: #51afef;">(</span>format
<span style="color: #98be65;">"</span><span style="color: #98be65; font-weight: bold;">\"</span><span style="color: #98be65;">%s</span><span style="color: #98be65; font-weight: bold;">\"</span><span style="color: #98be65;"> has preceeded </span><span style="color: #98be65; font-weight: bold;">\"</span><span style="color: #98be65;">hole</span><span style="color: #98be65; font-weight: bold;">\"</span><span style="color: #98be65;"> </span><span style="color: #98be65; font-weight: bold;">\"</span><span style="color: #98be65;">&lt;/s&gt;</span><span style="color: #98be65; font-weight: bold;">\"</span><span style="color: #98be65;"> </span><span style="color: #98be65; font-weight: bold;">\"</span><span style="color: #98be65;">&lt;/s&gt;</span><span style="color: #98be65; font-weight: bold;">\"</span><span style="color: #98be65;"> a total of %s times"</span>
word
<span style="color: #c678dd;">(</span>second <span style="color: #98be65;">(</span>get markov-tight-trie <span style="color: #a9a1e1;">(</span>map database <span style="color: #51afef;">[</span><span style="color: #98be65;">"&lt;/s&gt;"</span> <span style="color: #98be65;">"&lt;/s&gt;"</span> <span style="color: #98be65;">"hole"</span> word<span style="color: #51afef;">]</span><span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span><span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span>
<span style="color: #98be65;">[</span><span style="color: #98be65;">"a"</span> <span style="color: #98be65;">"this"</span> <span style="color: #98be65;">"that"</span><span style="color: #98be65;">]</span><span style="color: #c678dd;">)</span>
<span style="color: #c678dd;">(</span>run!
<span style="color: #98be65;">(</span><span style="color: #51afef;">fn</span> <span style="color: #a9a1e1;">[</span>word<span style="color: #a9a1e1;">]</span>
<span style="color: #a9a1e1;">(</span><span style="color: #51afef;">let</span> <span style="color: #51afef;">[</span>seed <span style="color: #c678dd;">[</span><span style="color: #98be65;">"&lt;/s&gt;"</span> <span style="color: #98be65;">"&lt;/s&gt;"</span> <span style="color: #98be65;">"hole"</span> word<span style="color: #c678dd;">]</span><span style="color: #51afef;">]</span>
<span style="color: #51afef;">(</span>println
<span style="color: #c678dd;">(</span>format
<span style="color: #98be65;">"%s is the perplexity of </span><span style="color: #98be65; font-weight: bold;">\"</span><span style="color: #98be65;">%s</span><span style="color: #98be65; font-weight: bold;">\"</span><span style="color: #98be65;"> </span><span style="color: #98be65; font-weight: bold;">\"</span><span style="color: #98be65;">hole</span><span style="color: #98be65; font-weight: bold;">\"</span><span style="color: #98be65;"> </span><span style="color: #98be65; font-weight: bold;">\"</span><span style="color: #98be65;">&lt;/s&gt;</span><span style="color: #98be65; font-weight: bold;">\"</span><span style="color: #98be65;"> </span><span style="color: #98be65; font-weight: bold;">\"</span><span style="color: #98be65;">&lt;/s&gt;</span><span style="color: #98be65; font-weight: bold;">\"</span><span style="color: #98be65;">"</span>
<span style="color: #98be65;">(</span><span style="color: #51afef;">-&gt;&gt;</span> seed
<span style="color: #a9a1e1;">(</span>map database<span style="color: #a9a1e1;">)</span>
<span style="color: #a9a1e1;">(</span><span style="color: #ECBE7B;">markov</span>/perplexity <span style="color: #da8548; font-weight: bold;">4</span> markov-tight-trie<span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span>
word<span style="color: #c678dd;">)</span><span style="color: #51afef;">)</span><span style="color: #a9a1e1;">)</span><span style="color: #98be65;">)</span>
<span style="color: #98be65;">[</span><span style="color: #98be65;">"a"</span> <span style="color: #98be65;">"this"</span> <span style="color: #98be65;">"that"</span><span style="color: #98be65;">]</span><span style="color: #c678dd;">)</span>
<span style="color: #a9a1e1;">nil</span><span style="color: #51afef;">)</span>
</pre>
</div>
<pre class="example">
"a" has preceeded "hole" "&lt;/s&gt;" "&lt;/s&gt;" a total of 250 times
"this" has preceeded "hole" "&lt;/s&gt;" "&lt;/s&gt;" a total of 173 times
"that" has preceeded "hole" "&lt;/s&gt;" "&lt;/s&gt;" a total of 45 times
-12.184088569934774 is the perplexity of "a" "hole" "&lt;/s&gt;" "&lt;/s&gt;"
-12.552930899563904 is the perplexity of "this" "hole" "&lt;/s&gt;" "&lt;/s&gt;"
-13.905719644461469 is the perplexity of "that" "hole" "&lt;/s&gt;" "&lt;/s&gt;"
</pre>
<p>
The results above make intuitive sense. The most common word to preceed &ldquo;hole&rdquo; at the end of a sentence is the word &ldquo;a&rdquo;. There are 250 instances of sentences of &ldquo;&#x2026; a hole.&rdquo;. That can be compared to 173 instances of &ldquo;&#x2026; this hole.&rdquo; and 45 instances of &ldquo;&#x2026; that hole.&rdquo;.
</p>
<p>
Therefore, &ldquo;&#x2026; a hole.&rdquo; is has the lowest &ldquo;perplexity&rdquo;.
</p>
<p>
This standardized measure of accuracy can be used to compare different language generation algorithms.
</p>
</div>
</div>
<div id="outline-container-org7791154" class="outline-3">
<h3 id="org7791154"><span class="section-number-3">1.9</span> Security Features</h3>
<div class="outline-text-3" id="text-1-9">
<p>
Artists/Songwriters place a lot of value in the secrecy of their content. Therefore, all communication with the web-based interface occurs over a secure connection using HTTPS.
</p>
<p>
Security certificates are generated using Let&rsquo;s Encrypt and an Nginx web server handles the SSL termination.
</p>
<p>
With this precaution in place, attackers will not be able to snoop the content that songwriters are sending to or receiving from the servers.
</p>
</div>
</div>
<div id="outline-container-org2118b36" class="outline-3">
<h3 id="org2118b36"><span class="section-number-3">1.10</span> <span class="todo TODO">TODO</span> Tools To Monitor And Maintain The Product</h3>
<div class="outline-text-3" id="text-1-10">
<ul class="org-ul">
<li>Script to auto-update SSL cert</li>
<li>Enable NGINX dashboard?</li>
</ul>
</div>
</div>
<div id="outline-container-org3e7ea9b" class="outline-3">
<h3 id="org3e7ea9b"><span class="section-number-3">1.11</span> <span class="todo TODO">TODO</span> A User-Friendly, Functional Dashboard That Includes At Least Three Visualization Types</h3>
</div>
</div>
<div id="outline-container-orge2d60f8" class="outline-2">
<h2 id="orge2d60f8"><span class="section-number-2">2</span> Documentation</h2>
<div class="outline-text-2" id="text-2">
<ol class="org-ol">
<li>Create each of the following forms of documentation for the product you have developed:</li>
</ol>
</div>
<div id="outline-container-orgcc70df2" class="outline-3">
<h3 id="orgcc70df2"><span class="section-number-3">2.1</span> Business Vision</h3>
<div class="outline-text-3" id="text-2-1">
<p>
Provide rhyming lyric suggestions optionally constrained by syllable count.
</p>
</div>
</div>
<div id="outline-container-orgc216269" class="outline-3">
<h3 id="orgc216269"><span class="section-number-3">2.2</span> Data Sets</h3>
<div class="outline-text-3" id="text-2-2">
<p>
See <code>resources/darklyrics-markov.tpt</code>
</p>
</div>
</div>
<div id="outline-container-org4fd130a" class="outline-3">
<h3 id="org4fd130a"><span class="section-number-3">2.3</span> Data Analysis</h3>
<div class="outline-text-3" id="text-2-3">
<p>
See <code>src/com/owoga/darklyrics/core.clj</code>
</p>
<p>
See <a href="https://github.com/eihli/prhyme">https://github.com/eihli/prhyme</a>
</p>
</div>
</div>
<div id="outline-container-org33bec77" class="outline-3">
<h3 id="org33bec77"><span class="section-number-3">2.4</span> Assessment</h3>
<div class="outline-text-3" id="text-2-4">
<p>
See visualization of rhyme suggestion in action.
</p>
<p>
See perplexity?
</p>
</div>
</div>
<div id="outline-container-orgae6aaf1" class="outline-3">
<h3 id="orgae6aaf1"><span class="section-number-3">2.5</span> Visualizations</h3>
<div class="outline-text-3" id="text-2-5">
<p>
See visualization of smoothing technique.
</p>
<p>
See wordcloud
</p>
</div>
</div>
<div id="outline-container-orgaad09e3" class="outline-3">
<h3 id="orgaad09e3"><span class="section-number-3">2.6</span> Accuracy</h3>
<div class="outline-text-3" id="text-2-6">
<p>
• assessment of the products accuracy
</p>
</div>
</div>
<div id="outline-container-org248156b" class="outline-3">
<h3 id="org248156b"><span class="section-number-3">2.7</span> Testing</h3>
<div class="outline-text-3" id="text-2-7">
<p>
• the results from the data product testing, revisions, and optimization based on the provided plans, including screenshots
</p>
</div>
</div>
<div id="outline-container-org4c2c5cb" class="outline-3">
<h3 id="org4c2c5cb"><span class="section-number-3">2.8</span> Source</h3>
<div class="outline-text-3" id="text-2-8">
<p>
• source code and executable file(s)
</p>
</div>
</div>
<div id="outline-container-org55bc8bd" class="outline-3">
<h3 id="org55bc8bd"><span class="section-number-3">2.9</span> Quick Start</h3>
<div class="outline-text-3" id="text-2-9">
<p>
• a quick start guide summarizing the steps necessary to install and use the product
</p>
</div>
</div>
</div>
<div id="outline-container-org3027af3" class="outline-2">
<h2 id="org3027af3"><span class="section-number-2">3</span> Notes</h2>
<div class="outline-text-2" id="text-3">
<p>
http-kit doesn&rsquo;t support https so no need to bother with keystore stuff like you would with jetty. Just proxy from haproxy.
</p>
</div>
</div>
</div>
<div id="postamble" class="status">
<p class="author">Author: Eric Ihli</p>
<p class="date">Created: 2021-07-13 Tue 20:39</p>
</div>
</body>
</html>