From e611e71838f257291b3bc7d8827c18e3cda991f0 Mon Sep 17 00:00:00 2001 From: Eric Ihli Date: Tue, 20 Jul 2021 16:40:36 -0500 Subject: [PATCH] Update README --- web/README_WGU.org | 119 +++- web/resources/public/README_WGU.htm | 893 ++++++++++++++++------------ 2 files changed, 616 insertions(+), 396 deletions(-) diff --git a/web/README_WGU.org b/web/README_WGU.org index a9b96b6..55142d1 100644 --- a/web/README_WGU.org +++ b/web/README_WGU.org @@ -25,7 +25,6 @@ It's probably not necessary for you to replicate my development environment in o - [[https://www.docker.com/][Docker]] - [[https://clojure.org/releases/downloads][Clojure Version 1.10+]] -- [[https://nodejs.org/en/download/][NodeJS]] - [[https://github.com/clojure-emacs/cider][Emacs and CIDER]] *** Steps @@ -33,7 +32,8 @@ It's probably not necessary for you to replicate my development environment in o 1. Run ~./db/run.sh && ./kv/run.sh~ to start the docker containers for the database and key-value store. a. The ~run.sh~ scripts only need to run once. They initialize development data containers. Subsequent development can continue with ~docker start db && docker start kv~. 2. Start a Clojure REPL in Emacs, evaluate the ~dev/user.clj~ namespace, and run ~(init)~ -3. Run ~npx shadow-cljs watch :frontend~ in the ~web/wgu-app~ directory to build the web interface. +3. Visit ~http://localhost:8000/wgu~ + ** How To Run Software Locally @@ -46,7 +46,7 @@ It's probably not necessary for you to replicate my development environment in o 1. Run ~./db/run.sh && ./kv/run.sh~ to start the docker containers for the database and key-value store. a. The ~run.sh~ scripts only need to run once. They initialize development data containers. Subsequent development can continue with ~docker start db && docker start kv~. 2. The application's ~jar~ builds with a ~make~ run from the root directory. (See [[file:../Makefile][Makefile]]). -3. Navigate to the root directory of this git repo and run ~java -jar darklimericks-dev.jar~ +3. Navigate to the root directory of this git repo and run ~java -jar darklimericks.jar~ 4. Visit http://localhost:8000/wgu * A. Letter Of Transmittal @@ -94,9 +94,9 @@ This software will accomplish its primary objective if it makes its way into the Several secondary objectives are also desirable and reasonably expected. The architecture of the software lends itself to existing as several independently useful modules. -For example, the Markov Model can be conveniently backed by a Trie data structure. This Trie data structure can be released as its own software package and used any application that benefits from prefix matching. +For example, the [[https://en.wikipedia.org/wiki/Hidden_Markov_model][Markov Model]] can be conveniently backed by a [[https://en.wikipedia.org/wiki/Trie][Trie data structure]]. This Trie data structure can be released as its own software package and used any application that benefits from prefix matching. -Another example is the package that turns phrases into phones. That package can find use for a number of natural language processing and natural language generation tasks, aside from the task required by this particular project. +Another example is the package that turns phrases into phones (symbols of pronunciation). That package can find use for a number of natural language processing and natural language generation tasks, aside from the task required by this particular project. ** Development Methodology - Agile @@ -126,9 +126,9 @@ These are my estimates for the time and cost of different aspects of initial dev | Total | 330 | $3,300 | -** NO the impact of the solution on stakeholders +** Stakeholder Impact -This seems redundant or irrelevant. The only stakeholders in the project I'm describing would be the record labels or songwriters and the impact on them is described in the [[Benefits]] section above. +The only stakeholders in the project will be the record labels or songwriters. I describe the only impact to them in the [[Benefits]] section above. ** Ethical And Legal Considerations @@ -199,7 +199,11 @@ Much of data science is exploratory and taking an iterative Agile approach can t ** Deliverables -Three aspects of this project are available as open source repositories on Github. +- Supporting libraries source code +- Application source code +- Deployed application + +The supporting libraries of this project are available as open source repositories on Github. [[https://github.com/eihli/clj-tightly-packed-trie][Tightly Packed Trie]] @@ -213,8 +217,6 @@ The trained data model and web interface has been deployed at the following addr ** Implementation Plan And Anticipations -the plan for implementation of your data product, including the anticipated outcomes from this development - I'll start by writing and releasing the supporting libraries and packages: Tries, Syllabification/Phonetics, Rhyming. Then I'll write a website that imports and uses those libraries. @@ -488,13 +490,94 @@ All Trie code is hosted in the git repo located at [[https://github.com/eihli/cl (get (.children- trie) (first k)))))) #+end_src -** TODO Data Visualization Functionalities For Data Exploration And Inspection +** Data Visualization Functionalities For Data Exploration And Inspection + +The functionality to explore and visualize data is baked into the Trie data structure. + +By simply viewing the Trie in a Clojure REPL, you can inspect the Trie's structure. + +#+begin_example + (let [initialized-trie (->> (trie/make-trie "dog" "dog" "dot" "dot" "do" "do"))] + initialized-trie) + ;; => {(\d \o \g) "dog", (\d \o \t) "dot", (\d \o) "do", (\d) nil} +#+end_example + +This functionality is provided by the implementations of the ~Associative~ and ~IPersistentMap~ interfaces. + +#+begin_src clojure +clojure.lang.Associative +(assoc [trie opath ovalue] + (if (empty? opath) + (IntKeyTrie. key ovalue children-) + (IntKeyTrie. key value (update + children- + (first opath) + (fnil assoc (IntKeyTrie. (first opath) nil (fast-sorted-map))) + (rest opath) + ovalue)))) +(entryAt [trie key] + (clojure.lang.MapEntry. key (get trie key))) +(containsKey [trie key] + (boolean (get trie key))) + +clojure.lang.IPersistentMap +(assocEx [trie key val] + (if (contains? trie key) + (throw (Exception. (format "Value already exists at key %s." key))) + (assoc trie key val))) +(without [trie key] + (-without trie key)) +#+end_src + +The Hidden Markov Model data structure doesn't lend itself to any useful graphical type of visualization or exploration. + +** Implementation Of Interactive Queries + +*** Generate Rhyming Lyrics + +This interactive query will return a list of rhyming phrases to any word or phrase you enter. + +For example, the phrase ~don't bother me~ returns the following results. -- graph of phrase complexity on one axis and rhyme quality on another axis. +| Rhyme | Quality | Lyric | Perplexity | +| forsee | 5 | i'm not one of us forsee | -0.150812027039802 | +| wholeheartedly | 5 | purification has replaced wholeheartedly | -0.23227389702753784 | +| merci | 5 | domine, non merci | -0.2567394520839273 | +| oversea | 5 | i let's torch oversea | -0.3940312599117676 | +| me | 4 | that is found in me | -0.12708613143793374 | +| thee | 4 | you ask thee | -0.20919974848757947 | +| free | 4 | direct from me free | -0.29056603191271085 | +| harmony | 3 | it's time to go, this harmony | -0.06634608923365708 | +| society | 3 | mutilation rejected by society | -0.10624747249791901 | +| prophecy | 3 | take us to the brink of disaster dreamer just a savage prophecy | -0.13097443386137644 | +| honesty | 3 | for you my threw all that can be the power not honesty | -0.2423380760939454 | +| constantly | 3 | i thrust my sword into the dragon's annihilation that constantly | -0.2474276676860057 | +| reality | 2 | smack of reality | -0.14811632033013192 | +| eternity | 2 | with trust in loneliness in eternity | -0.1507561510378151 | +| misery | 2 | reminiscing over misery | -0.29506597978960253 | -** TODO Implementation Of Interactive Queries +The interactive query for the above can be found at https://darklimericks.com/wgu/lyric-from-seed?seed=don%27t+bother+me. Note that, since these lyrics are randomly generated, your results will vary. -Interactive query capability at [[https://darklimericks.com/wgu]]. +*** Complete Lyric Containing Suffix + +This interactive query will return a list of lyrics completing the given suffix with randomly generated prefixes. + +For example, let's say a songwriter liked the phrase ~rejected by society~ above, but they want to brainstorm different beginnings of that line. + +| Lyric | OpenNLP Perplexity | Per-word OpenNLP Perplexity | +| we have rejected by society | -0.6593112258099724 | -0.03878301328293955 | +| she rejected by society | -1.0992937688019973 | -0.07852098348585694 | +| i was despised and rejected by society | -3.5925278871864497 | -0.15619686466028043 | +| the exiled and rejected by society | -3.6944350673672144 | -0.21731970984513027 | +| to smell the death mutilation rejected by society | -5.899263654566813 | -0.2458026522736172 | +| time goes yearning again only to be rejected by society | -2.764028722852962 | -0.08375844614705946 | +| you won't survive the mutilation rejected by society | -2.5299544352623986 | -0.09035551554508567 | +| your rejected by society | -1.4840658880458661 | -0.10600470628899043 | +| dividing lands, rejected by society | -2.2975947244849793 | -0.12764415136027663 | +| a voice summons all angry exiled and rejected by society | -9.900290597751827 | -0.17679090353128263 | +| protect the rejected by society | -4.210741684291847 | -0.28071611228612314 | + +The interactive query for the above can be found at https://darklimericks.com/wgu/rhyming-lyric?rhyming-lyric-target=rejected+by+society. Note again that your results will vary. ** Implementation Of Machine Learning Methods @@ -749,12 +832,14 @@ Provide rhyming lyric suggestions optionally constrained by syllable count. - [ ] Given a word or phrase, suggest rhymes (ranked by quality) (Trie) - [ ] Given a word or phrase, suggest lyric completion (Hidden Markov Model) - + [ ] Restrict suggestion by syllable count + + [ ] (Future iteration) Restrict suggestion by syllable count + [ ] Restrict suggestion by rhyme quality - + [ ] Show graph of suggestions with perplexity on one axis and rhyme quality on the other + + [ ] (Future iteration) Show graph of suggestions with perplexity on one axis and rhyme quality on the other ** Data Sets +The dataset was obtained from http://darklyrics.com. + See ~resources/darklyrics-markov.tpt~ ** Data Analysis @@ -773,7 +858,7 @@ See perplexity? See visualization of smoothing technique. -See wordcloud +See wordcloud? ** Accuracy diff --git a/web/resources/public/README_WGU.htm b/web/resources/public/README_WGU.htm index bd88fb1..f9f067d 100644 --- a/web/resources/public/README_WGU.htm +++ b/web/resources/public/README_WGU.htm @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> - + RhymeStorm™ - WGU CSCI Capstone Project @@ -215,28 +215,6 @@ /*]]>*///--> // @license-end - -
@@ -245,100 +223,106 @@

Table of Contents

-
-

1 WGU Evaluator Notes

+
+

1 WGU Evaluator Notes

Hello! I hope you enjoy your time with this evaluation! @@ -362,32 +346,51 @@ After I describe the steps to initialize a development environment, you’ll

-
-

2 Evaluation Technical Documentation

+
+

2 Evaluation Technical Documentation

+

+It’s probably not necessary for you to replicate my development environment in order to evaluate this project. You can access the deployed application at https://darklimericks.com/wgu and the libraries and supporting code that I wrote for this project at https://github.com/eihli/clj-tightly-packed-trie, https://github.com/eihli/syllabify, and https://github.com/eihli/prhyme. The web server and web application is not hosted publicly but you will find it uploaded with my submission as a .tar archive. +

-
-

2.1 How To Initialize Development Environment

+ +
+

2.1 How To Initialize Development Environment

-
-

2.1.1 Required Software

+
+

2.1.1 Required Software

+ +
+

2.1.2 Steps

+
+
    +
  1. Run ./db/run.sh && ./kv/run.sh to start the docker containers for the database and key-value store. +
      +
    1. The run.sh scripts only need to run once. They initialize development data containers. Subsequent development can continue with docker start db && docker start kv.
    2. +
  2. +
  3. Start a Clojure REPL in Emacs, evaluate the dev/user.clj namespace, and run (init)
  4. +
  5. Visit http://localhost:8000/wgu
  6. +
+
+
+ -
-

2.2 How To Run Software Locally

+
+

2.2 How To Run Software Locally

-
-

2.2.1 Requirements

+
+

2.2.1 Requirements

  • Java
  • @@ -396,8 +399,8 @@ After I describe the steps to initialize a development environment, you’ll
-
-

2.2.2 Steps

+
+

2.2.2 Steps

  1. Run ./db/run.sh && ./kv/run.sh to start the docker containers for the database and key-value store. @@ -405,7 +408,7 @@ After I describe the steps to initialize a development environment, you’ll
  2. The run.sh scripts only need to run once. They initialize development data containers. Subsequent development can continue with docker start db && docker start kv.
  • The application’s jar builds with a make run from the root directory. (See Makefile).
  • -
  • Navigate to the root directory of this git repo and run java -jar darklimericks-dev.jar
  • +
  • Navigate to the root directory of this git repo and run java -jar darklimericks.jar
  • Visit http://localhost:8000/wgu
  • @@ -418,8 +421,8 @@ After I describe the steps to initialize a development environment, you’ll
    -
    -

    3.1 Problem Summary

    +
    +

    3.1 Problem Summary

    Songwriters, artists, and record labels can save time and discover better lyrics with the help of a machine learning tool that supports their creative endeavours. @@ -431,8 +434,8 @@ Songwriters have several old-fashioned tools at their disposal including diction

    -
    -

    3.2 Benefits

    +
    +

    3.2 Benefits

    How many sensible phrases can you think of that rhyme with “war on poverty”? What if I say that there’s a restriction to only come up with phrases that are exactly 14 syllables? That’s a common restriction when a songwriter is trying to match the meter of a previous line. What if I add another restriction that there must be primary stress at certain spots in that 14 syllable phrase? @@ -448,8 +451,8 @@ And this is a process that is perfect for machine learning. Machine learning can

    -
    -

    3.3 Product - RhymeStorm™

    +
    +

    3.3 Product - RhymeStorm™

    RhymeStorm™ is a tool to help songwriters brainstorm. It provides lyrics automatically generated based on training data from existing songs while adhering to restrictions based on rhyme scheme, meter, genre, and more. @@ -477,8 +480,8 @@ This auto-complete functionality will be similar to the auto-complete that is co

    -
    -

    3.4 Data

    +
    +

    3.4 Data

    The initial model will be trained on the lyrics from http://darklyrics.com. This is a publicly available data set with minimal meta-data. Record labels will have more valuable datasets that will include meta-data along with lyrics, such as the date the song was popular, the number of radio plays of the song, the profit of the song/artist, etc… @@ -490,8 +493,8 @@ The software can be augmented with additional algorithms to account for the type

    -
    -

    3.5 Objectives

    +
    +

    3.5 Objectives

    This software will accomplish its primary objective if it makes its way into the daily toolkit of a handful of singers/songwriters. @@ -502,17 +505,17 @@ Several secondary objectives are also desirable and reasonably expected. The arc

    -For example, the Markov Model can be conveniently backed by a Trie data structure. This Trie data structure can be released as its own software package and used any application that benefits from prefix matching. +For example, the Markov Model can be conveniently backed by a Trie data structure. This Trie data structure can be released as its own software package and used any application that benefits from prefix matching.

    -Another example is the package that turns phrases into phones. That package can find use for a number of natural language processing and natural language generation tasks, aside from the task required by this particular project. +Another example is the package that turns phrases into phones (symbols of pronunciation). That package can find use for a number of natural language processing and natural language generation tasks, aside from the task required by this particular project.

    -
    -

    3.6 Development Methodology - Agile

    +
    +

    3.6 Development Methodology - Agile

    This project will be developed with an iterative Agile methodology. Since a large part of data science and machine learning is exploration, this project will benefit from ongoing exploration in tandem with development. @@ -528,8 +531,8 @@ The prices quoted below are for an initial minimum-viable-product that will serv

    -
    -

    3.7 Costs

    +
    +

    3.7 Costs

    Funding requirements are minimal. The initial dataset is public and freely available. On a typical consumer laptop, Hidden Markov Models can be trained on fairly large datasets in short time and the training doesn’t require the use of expensive hardware like the GPUs used to train Deep Neural Networks. @@ -614,17 +617,17 @@ These are my estimates for the time and cost of different aspects of initial dev

    -
    -

    3.8 NO the impact of the solution on stakeholders

    +
    +

    3.8 Stakeholder Impact

    -This seems redundant or irrelevant. The only stakeholders in the project I’m describing would be the record labels or songwriters and the impact on them is described in the 3.2 section above. +The only stakeholders in the project will be the record labels or songwriters. I describe the only impact to them in the 3.2 section above.

    -
    -

    3.9 Ethical And Legal Considerations

    +
    +

    3.9 Ethical And Legal Considerations

    Web scraping, the method used to obtain the initial dataset from http://darklyrics.com, is protected given the ruling in https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn. @@ -636,8 +639,8 @@ The use of publicly available data in generative works is less clear. But Micros

    -
    -

    3.10 Expertise

    +
    +

    3.10 Expertise

    I have 10 years experience as a programmer and have worked extensively on both frontend technologies like HTML/JavaScript, backend technologies like Django, and building libraries/packages/frameworks. @@ -658,8 +661,8 @@ Write an executive summary directed to IT professionals that addresses each of t

    -
    -

    4.1 Decision Support Opportunity

    +
    +

    4.1 Decision Support Opportunity

    Songwriters expend a lot of time and effort finding the perfect rhyming word or phrase. RhymeStorm™ is going to amplify user’s creative abilities by searching its machine learning model for sensible and proven-successful words and phrases that meet the rhyme scheme and meter requirements requested by the user. @@ -671,8 +674,8 @@ When a songwriter needs to find likely phrases that rhyme with “war on pov

    -
    -

    4.2 Customer Needs And Product Description

    +
    +

    4.2 Customer Needs And Product Description

    Songwriters spend money on dictionaries, compilations of slang, thesauruses, and phrase dictionaries. They spend their time daydreaming, brainstorming, contemplating, and mixing and matching the knowledge they acquire through these traditional means. @@ -692,8 +695,8 @@ Computers can process and sort this information and sort the results by quality

    -
    -

    4.3 Existing Products

    +
    +

    4.3 Existing Products

    We’re all familiar with dictionaries, thesauruses, and their shortcomings. @@ -709,8 +712,8 @@ RhymeZone is limited in its capability. It doesn’t do well finding rhymes

    -
    -

    4.4 Available Data And Future Data Lifecycle

    +
    +

    4.4 Available Data And Future Data Lifecycle

    The initial dataset will be gathered by downloading lyrics from http://darklyrics.com and future models can be generated by downloading lyrics from other websites. Alternatively, data can be provided by record labels and combined with meta-data that the record label may have, such as how many radio plays each song gets and how much profit they make from each song. @@ -734,8 +737,8 @@ Each new model can be uploaded to the web server and users can select which mode

    -
    -

    4.5 Methodology - Agile

    +
    +

    4.5 Methodology - Agile

    RhymeStorm™ development will proceed with an iterative Agile methodology. It will be composed of several independent modules that can be worked on independently, in parallel, and iteratively. @@ -759,11 +762,17 @@ Much of data science is exploratory and taking an iterative Agile approach can t

    -
    -

    4.6 Deliverables

    +
    +

    4.6 Deliverables

    +
      +
    • Supporting libraries source code
    • +
    • Application source code
    • +
    • Deployed application
    • +
    +

    -Three aspects of this project are available as open source repositories on Github. +The supporting libraries of this project are available as open source repositories on Github.

    @@ -788,13 +797,9 @@ The trained data model and web interface has been deployed at the following addr

    -
    -

    4.7 Implementation Plan And Anticipations

    +
    +

    4.7 Implementation Plan And Anticipations

    -

    -the plan for implementation of your data product, including the anticipated outcomes from this development -

    -

    I’ll start by writing and releasing the supporting libraries and packages: Tries, Syllabification/Phonetics, Rhyming.

    @@ -813,8 +818,8 @@ In anticipation of user growth, I’ll be deploying the final product on Dig
    -
    -

    4.8 Requirements Validation And Verification

    +
    +

    4.8 Requirements Validation And Verification

    the methods for validating and verifying that the developed data product meets the requirements and subsequently the needs of the customers @@ -834,8 +839,8 @@ The final website will integrate multiple technologies and the integrations won&

    -
    -

    4.9 Programming Environments And Costs

    +
    +

    4.9 Programming Environments And Costs

    the programming environments and any related costs, as well as the human resources that are necessary to execute each phase in the development of the data product @@ -859,8 +864,8 @@ All code was written and all models were trained on a Lenovo T15G with an Intel

    -
    -

    4.10 Timeline And Milestones

    +
    +

    4.10 Timeline And Milestones

    @@ -938,16 +943,16 @@ RhymeStorm™ is an application to help singers and songwriters brainstorm new l

    -
    -

    5.1 Descriptive And Predictive Methods

    +
    +

    5.1 Descriptive And Predictive Methods

    -
    -

    5.1.1 Descriptive Method

    +
    +

    5.1.1 Descriptive Method

      -
    1. Most Common Grammatical Structures In A Set Of Lyrics
      +
    2. Most Common Grammatical Structures In A Set Of Lyrics

      By filtering songs by metrics such as popularity, number of awards, etc… we can use this software package to determine the most common grammatical phrase structure for different filtered categories. @@ -994,28 +999,28 @@ In the example below, you’ll see that a simple noun-phrase is the most pop

    - - + + - - + + - - + + - - + + - - + +
    (TOP (S (S (S (S (S (NP (NN)) (VP (VBP) (S (NP (JJ) (NNS)) (VP (VBD) (S (VP (TO) (VP (VB)))))))) nil (CC) (S (NP (PRP)) (VP (VBP) (ADJP (RB) (JJ)) (S (VP (TO) (VP (VB) (NP (PRP)) (SBAR (S (NP (DT) (NN)) (VP (VBZ) (RB) (PP (IN) (NP (JJ)))))))))))) (.) (NP (PRP)) (VP (MD) (VP (VB)))) nil (IN) (S (NP (PRP)) (VP (MD) (VP (VB) (S (VP (TO) (VP (VB)))))))) (.) (S (CC) (NP (PRP)) (VP (VBP) (ADJP (RB) (JJ)) (SBAR (S (NP (DT)) (VP (VBZ) (NP (NP (RB) (DT) (NN)) (SBAR (S (NP (PRP)) (VP (VBZ) (SBAR (S (NP (PRP)) (VP (MD) (ADVP (RB)) (VP (VB) (S (NP (PRP)) (ADJP (JJ))) (SBAR (IN) (S (VP (VBP) (NP (DT) (JJ) (NN)))))))))))))))))) (.)))1(TOP (NP (NNP) (.)))6
    (INC (NP (DT) (JJ) (NN)) (.) (NP (DT) (NN)) (VBZ) (ADJP (JJ)) (.) (NP (PRP)) (VP (VB)) (CC) (VB) (IN) (NP (NNS)) (.) (CC) (NP (NN)) (VBZ) (ADVP (RB)) (VP (VBD)) (.) (NP (PRP)) (VBP) (IN) (NP (PRP)) (:) (NP (PRP)) (VBP) (IN) (NP (PRP)) (.) (JJ) (NN) (VBZ) (ADJP (JJ)) (ADVP (RB)) (.) (CC) (JJ) (IN) (VBG) (NP (PRP$) (JJ) (NN)) (.) (CC) (NP (NN)) (VBZ) (VP (VBN)) (.) (NNP) (NN) (.))1(TOP (S (NP (PRP)) (VP (VBP) (ADJP (JJ))) (.)))6
    (TOP (S (S (S (S (NP (PRP)) (VP (VBZ) (ADJP (JJ)) (S (VP (TO) (VP (VB)))))) (.) (S (SBAR (WHADVP (WRB)) (S (S (NP (DT) (NN)) (VP (VBZ) (VP (VBN) (PP (IN) (NP (PRP\() (VBN) (NN)))))) (.) (NP (PRP)) (VP (VBP) (PP (IN) (NP (PRP\)) (JJ) (NN))) (PP (IN) (NP (PRP\() (JJ) (NN)))))) (NP (PRP)) (VP (VBP) (NP (CD) (NNS)) (SBAR (RB) (.) (S (NP (PRP)) (VP (VBZ) (RB) (NP (PRP\)) (NNS)) (SBAR (IN) (S (NP (NP (NN)) (SBAR (S (NP (PRP)) (VP (VBZ) (NP (ADJP (JJR)) (DT) (NN) (SBAR (IN) (S (NP (PRP)) (ADVP (RB)) (VP (VBP))))))))) (.) (NP (PRP)) (VP (VBP) (CC) (VP (VBP) (S (NP (NN)) (ADVP (RB)) (VP (VBG) (S (VP (TO) (VP (VB) (NP (DT) (NN))))))))))))))))) (.) (NP (PRP$) (NNS)) (VP (NN))) (IN) (S (S (NP (NP (DT) (NN)) (SBAR (S (NP (CD) (NNS)) (ADVP (RB)) (NP (PRP)) (VP (VBP) (ADJP (VBN)))))) (.) (NP (PRP)) (VP (VP (VBP) (NP (DT) (NN))) (CC) (VP (VB) (NP (DT) (JJ) (NN))))) (.) (NP (DT) (JJ) (NN)) (VP (VBZ) (PP (IN) (CC)) (VP (IN)))) (.)))1(INC (NP (JJ) (NN)) nil (IN) (NP (DT)) (NP (PRP)) (VBP))4
    (TOP (S (S (S (S (NP (NP (NNP)) (SBAR (S (NP (NP (NP (PRP\() (NNS)) (PP (IN) (NP (JJ) (NNS)))) nil (NP (NP (NNP)) (PP (IN) (NP (DT) (JJ) (NNS))))) (VP (VBD) (SBAR (S (NP (NN)) (VP (VBZ) (NP (PRP\)) (NN)) (SBAR (IN) (S (NP (PRP)) (VP (VBD) (VP (VBG) (NP (JJ) (NNS))))))))))))) (.) (S (NP (NN)) (VP (VBZ) (ADVP (RB) (CC) (RB) (JJ)))) (.) (S (PP (IN) (NP (PRP\() (NN))) (.) (NP (DT) (VBN) (NN)) (VP (VBZ) (VP (VBN))))) (.) (NP (PRP\)) (NN) (NNS)) (SBAR (IN) (S (NP (DT)) (VP (VBZ) (NP (NP (PRP\() (NN)) (PP (IN) (NP (JJ) (NN)))))))) (.) (NP (PRP\)) (NN)) (VP (VBD) (RB) (VP (VB) (S (NP (PRP)) (ADJP (DT) (JJR)))))) (.) (VP (VB) (SBAR (S (S (NP (PRP$) (NNS)) (.) (VP (VB) (S (NP (JJ) (JJ) (NNS)) (.) (VP (VB) (ADJP (JJ)))))) (VP (VBZ) (RB) (PP (IN) (NP (NN))))))) (.)))1(TOP (NP (NP (JJ) (NN)) nil (NP (NN) (CC) (NN))))4
    (TOP (S (S (S (ADVP (RB)) (PP (PP (IN) (NP (NN) (NNS))) (CC) (PP (IN) (NP (NP (JJ) (NNS)) (SBAR (S (NP (PRP)) (VP (MD) (ADVP (RB)) (S (NP (NNS)) (VP (VBP) (PP (IN) (NP (NP (NP (DT) (NN)) (PP (IN) (NP (DT) (JJ) (NN)))) (SBAR (S (NP (PRP)) (ADVP (RB)) (VP (VBP) (NP (DT) (JJ) (NN))))))))))))))) (.) (CC) (S (ADVP (RB)) (NP (NNS)) (VP (VBP) (ADVP (RB)) (NP (NP (NN)) (CC) (NP (NP (NP (DT) (NN)) (PP (IN) (NP (NNS)))) (PP (IN) (NP (NN))))))) (.) (NP (NNP)) (VP (VBD) (PP (IN) (NP (NP (DT) (NN)) (PP (IN) (NP (NNS))))))) (.) (RB) (S (NP (DT)) (VP (VBZ) (S (JJ) nil (S (ADVP (RB)) (NP (PRP)) (VP (VBP) (SBAR (IN) (S (NP (PRP)) (VP (VBD) (RB) (VP (VB) (SBAR (S (NP (PRP)) (VP (MD) (VP (VB) (ADJP (JJR))))))))))))))) (.) (S (SBAR (IN) (S (NP (PRP)) (VP (VBD) (RB) (VP (VB) (PP (IN) (NP (DT) (JJ) (CC) (CD))))))) (NP (PRP)) (VP (MD) (VP (VB) (NP (NP (PRP\() (NN)) (CC) (NP (PRP\)) (NNS))))))) (.) () (NP (NNS)) (VP (MD) (VP (VB) (ADVP (RB)))) (.)))1(TOP (S (NP (JJ) (NN)) nil (VP (VBG) (ADJP (JJ)))))4
    @@ -1024,12 +1029,12 @@ In the example below, you’ll see that a simple noun-phrase is the most pop
    -
    -

    5.1.2 Prescriptive Method

    +
    +

    5.1.2 Prescriptive Method

      -
    1. Most Likely Word To Follow A Given Phrase
      +
    2. Most Likely Word To Follow A Given Phrase

      To help songwriters think of new lyrics, we provide an API to receive a list of words that commonly follow/precede a given phrase. @@ -1125,8 +1130,8 @@ In the example below, we provide a seed suffix of “bother me” and as

    -
    -

    5.2 Datasets

    +
    +

    5.2 Datasets

    The dataset currently in use was generated from the publicly available lyrics at http://darklyrics.com. @@ -1138,12 +1143,12 @@ Further datasets will need to be provided by the end-user.

    -
    -

    5.3 Decision Support Functionality

    +
    +

    5.3 Decision Support Functionality

    -
    -

    5.3.1 Choosing Words For A Lyric Based On Markov Likelihood

    +
    +

    5.3.1 Choosing Words For A Lyric Based On Markov Likelihood

    Entire phrases can be generated using the previously mentioned functionality of generating lists of likely prefix/suffix words. @@ -1159,8 +1164,8 @@ The user can supply criteria such as restrictions on the number of syllables, nu

    -
    -

    5.3.2 Choosing Words To Complete A Lyric Based On Rhyme Quality

    +
    +

    5.3.2 Choosing Words To Complete A Lyric Based On Rhyme Quality

    Another part of the decision support functionality is filtering and ordering predicted words based on their rhyme quality. @@ -1219,166 +1224,14 @@ In the example below, you’ll see that the first 20 or so rhymes are perfec - - - - -rhyme -frequency count -rhyme quality +class java.lang.IllegalStateException -technology -318 -8 - - - -apology -68 -7 - - - -pathology -42 -7 - - - -mythology -27 -7 - - - -psychology -24 -7 - - - -theology -23 -7 - - - -biology -20 -7 - - - -ecology -11 -7 - - - -chronology -10 -7 - - - -astrology -9 -7 - - - -biotechnology -8 -7 - - - -nanotechnology -5 -7 - - - -geology -3 -7 - - - -ontology -2 -7 - - - -morphology -2 -7 - - - -seismology -1 -7 - - - -urology -1 -7 - - - -doxology -0 -7 - - - -neurology -0 -7 - - - -hypocrisy -723 -6 - - - -democracy -238 -6 - - - -atrocity -224 -6 - - - -philosophy -181 -6 - - - -equality -109 -6 - - - -ideology -105 -6 +[[“rhyme” “frequency count” “rhyme quality”] [“technology” 318 8] [“apology” 68 7] [“pathology” 42 7] [“mythology” 27 7] [“psychology” 24 7] [“theology” 23 7] [“biology” 20 7] [“ecology” 11 7] [“chronology” 10 7] [“astrology” 9 7] [“biotechnology” 8 7] [“nanotechnology” 5 7] [“geology” 3 7] [“ontology” 2 7] [“morphology” 2 7] [“seismology” 1 7] [“urology” 1 7] [“doxology” 0 7] [“neurology” 0 7] [“hypocrisy” 723 6] [“democracy” 238 6] [“atrocity” 224 6] [“philosophy” 181 6] [“equality” 109 6] [“ideology” 105 6]] @@ -1386,8 +1239,8 @@ In the example below, you’ll see that the first 20 or so rhymes are perfec

    -
    -

    5.4 Featurizing, Parsing, Cleaning, And Wrangling Data

    +
    +

    5.4 Featurizing, Parsing, Cleaning, And Wrangling Data

    The data processing code is in https://github.com/eihli/prhyme @@ -1424,8 +1277,8 @@ words can be compared: “Foo” is the same as “foo”.

    -
    -

    5.5 Data Exploration And Preparation

    +
    +

    5.5 Data Exploration And Preparation

    The primary data structure and algorithms supporting exploration of the data are a Markov Trie @@ -1473,26 +1326,312 @@ All Trie code is hosted in the git repo located at -

    5.6 TODO Data Visualization Functionalities For Data Exploration And Inspection

    +
    +

    5.6 Data Visualization Functionalities For Data Exploration And Inspection

    -
      -
    • graph of phrase complexity on one axis and rhyme quality on another axis.
    • -
    +

    +The functionality to explore and visualize data is baked into the Trie data structure. +

    + +

    +By simply viewing the Trie in a Clojure REPL, you can inspect the Trie’s structure. +

    + +
    +  (let [initialized-trie (->> (trie/make-trie "dog" "dog" "dot" "dot" "do" "do"))]
    +    initialized-trie)
    +    ;; => {(\d \o \g) "dog", (\d \o \t) "dot", (\d \o) "do", (\d) nil}
    +
    + +

    +This functionality is provided by the implementations of the Associative and IPersistentMap interfaces. +

    + +
    +
    clojure.lang.Associative
    +(assoc [trie opath ovalue]
    +  (if (empty? opath)
    +    (IntKeyTrie. key ovalue children-)
    +    (IntKeyTrie. key value (update
    +                      children-
    +                      (first opath)
    +                      (fnil assoc (IntKeyTrie. (first opath) nil (fast-sorted-map)))
    +                      (rest opath)
    +                      ovalue))))
    +(entryAt [trie key]
    +  (clojure.lang.MapEntry. key (get trie key)))
    +(containsKey [trie key]
    +  (boolean (get trie key)))
    +
    +clojure.lang.IPersistentMap
    +(assocEx [trie key val]
    +  (if (contains? trie key)
    +    (throw (Exception. (format "Value already exists at key %s." key)))
    +    (assoc trie key val)))
    +(without [trie key]
    +  (-without trie key))
    +
    +
    + +

    +The Hidden Markov Model data structure doesn’t lend itself to any useful graphical type of visualization or exploration. +

    -
    -

    5.7 TODO Implementation Of Interactive Queries

    +
    +

    5.7 Implementation Of Interactive Queries

    +
    +
    +

    5.7.1 Generate Rhyming Lyrics

    +
    +

    +This interactive query will return a list of rhyming phrases to any word or phrase you enter. +

    + +

    +For example, the phrase don't bother me returns the following results. +

    + + + + +++ ++ ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    RhymeQualityLyricPerplexity
    forsee5i’m not one of us forsee-0.150812027039802
    wholeheartedly5purification has replaced wholeheartedly-0.23227389702753784
    merci5domine, non merci-0.2567394520839273
    oversea5i let’s torch oversea-0.3940312599117676
    me4that is found in me-0.12708613143793374
    thee4you ask thee-0.20919974848757947
    free4direct from me free-0.29056603191271085
    harmony3it’s time to go, this harmony-0.06634608923365708
    society3mutilation rejected by society-0.10624747249791901
    prophecy3take us to the brink of disaster dreamer just a savage prophecy-0.13097443386137644
    honesty3for you my threw all that can be the power not honesty-0.2423380760939454
    constantly3i thrust my sword into the dragon’s annihilation that constantly-0.2474276676860057
    reality2smack of reality-0.14811632033013192
    eternity2with trust in loneliness in eternity-0.1507561510378151
    misery2reminiscing over misery-0.29506597978960253
    +

    -Interactive query capability at https://darklimericks.com/wgu. +The interactive query for the above can be found at https://darklimericks.com/wgu/lyric-from-seed?seed=don%27t+bother+me. Note that, since these lyrics are randomly generated, your results will vary.

    -
    -

    5.8 Implementation Of Machine Learning Methods

    +
    +

    5.7.2 Complete Lyric Containing Suffix

    +
    +

    +This interactive query will return a list of lyrics completing the given suffix with randomly generated prefixes. +

    + +

    +For example, let’s say a songwriter liked the phrase rejected by society above, but they want to brainstorm different beginnings of that line. +

    + + + + +++ ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    LyricOpenNLP PerplexityPer-word OpenNLP Perplexity
    we have rejected by society-0.6593112258099724-0.03878301328293955
    she rejected by society-1.0992937688019973-0.07852098348585694
    i was despised and rejected by society-3.5925278871864497-0.15619686466028043
    the exiled and rejected by society-3.6944350673672144-0.21731970984513027
    to smell the death mutilation rejected by society-5.899263654566813-0.2458026522736172
    time goes yearning again only to be rejected by society-2.764028722852962-0.08375844614705946
    you won’t survive the mutilation rejected by society-2.5299544352623986-0.09035551554508567
    your rejected by society-1.4840658880458661-0.10600470628899043
    dividing lands, rejected by society-2.2975947244849793-0.12764415136027663
    a voice summons all angry exiled and rejected by society-9.900290597751827-0.17679090353128263
    protect the rejected by society-4.210741684291847-0.28071611228612314
    + +

    +The interactive query for the above can be found at https://darklimericks.com/wgu/rhyming-lyric?rhyming-lyric-target=rejected+by+society. Note again that your results will vary. +

    +
    +
    +
    + +
    +

    5.8 Implementation Of Machine Learning Methods

    The machine learning method chosen for this software is a Hidden Markov Model. @@ -1562,19 +1701,11 @@ The algorithm for generating predictions from the HMM is as follows.

    -
    -[(("<s>" "pain")
    -  ("<s>" "lone" "i")
    -  ("<s>" "lone")
    -  ("<s>" "black" "is")
    -  ("<s>" "black")
    -  ("<s>" "to" "rip")
    -  ("<s>" "to")
    -  ("<s>" "too" "late")
    -  ("<s>" "too")
    -  ("<s>" "how" "wrong"))]
    +
    +class java.lang.IllegalStateException
     
    +

    The results above show a sample of 10 elements in a 1-to-3-gram trie

    @@ -1651,8 +1782,8 @@ It also performs compaction and serialization. Song lyrics are typically provide
    -
    -

    5.9 Functionalities To Evaluate The Accuracy Of The Data Product

    +
    +

    5.9 Functionalities To Evaluate The Accuracy Of The Data Product

    Since creative brainstorming is the goal, “accuracy” is subjective. @@ -1719,8 +1850,8 @@ This standardized measure of accuracy can be used to compare different language

    -
    -

    5.10 Security Features

    +
    +

    5.10 Security Features

    Artists/Songwriters place a lot of value in the secrecy of their content. Therefore, all communication with the web-based interface occurs over a secure connection using HTTPS. @@ -1736,21 +1867,25 @@ With this precaution in place, attackers will not be able to snoop the content t

    -
    -

    5.11 TODO Tools To Monitor And Maintain The Product

    +
    +

    5.11 TODO Tools To Monitor And Maintain The Product

    • Script to auto-update SSL cert
    • -
    • Enable NGINX dashboard?
    -
    -

    5.12 TODO A User-Friendly, Functional Dashboard That Includes At Least Three Visualization Types

    +
    +

    5.12 TODO A User-Friendly, Functional Dashboard That Includes At Least Three Visualization Types

    +
    +
      +
    • oz graph of perplexity/rhyme quality
    • +
    • tables (sortable?)
    • +
    +
    -

    6 D. Documentation

    @@ -1760,16 +1895,16 @@ Create each of the following forms of documentation for the product you have dev

    -
    -

    6.1 Business Vision

    +
    +

    6.1 Business Vision

    Provide rhyming lyric suggestions optionally constrained by syllable count.

    -
    -

    6.1.1 Requirements

    +
    +

    6.1.1 Requirements

    • [ ] Given a word or phrase, suggest rhymes (ranked by quality) (Trie)
    • @@ -1784,8 +1919,8 @@ Provide rhyming lyric suggestions optionally constrained by syllable count.
    -
    -

    6.2 Data Sets

    +
    +

    6.2 Data Sets

    See resources/darklyrics-markov.tpt @@ -1793,8 +1928,8 @@ See resources/darklyrics-markov.tpt

    -
    -

    6.3 Data Analysis

    +
    +

    6.3 Data Analysis

    See src/com/owoga/darklyrics/core.clj @@ -1806,8 +1941,8 @@ See https://github.com/eihli/prhyme

    -
    -

    6.4 Assessment

    +
    +

    6.4 Assessment

    See visualization of rhyme suggestion in action. @@ -1819,8 +1954,8 @@ See perplexity?

    -
    -

    6.5 Visualizations

    +
    +

    6.5 Visualizations

    See visualization of smoothing technique. @@ -1832,8 +1967,8 @@ See wordcloud

    -
    -

    6.6 Accuracy

    +
    +

    6.6 Accuracy

    • assessment of the product’s accuracy @@ -1841,8 +1976,8 @@ See wordcloud

    -
    -

    6.7 Testing

    +
    +

    6.7 Testing

    • the results from the data product testing, revisions, and optimization based on the provided plans, including screenshots @@ -1850,8 +1985,8 @@ See wordcloud

    -
    -

    6.8 Source

    +
    +

    6.8 Source

    • source code and executable file(s) @@ -1859,8 +1994,8 @@ See wordcloud

    -
    -

    6.9 Quick Start

    +
    +

    6.9 Quick Start

    • a quick start guide summarizing the steps necessary to install and use the product @@ -1869,8 +2004,8 @@ See wordcloud

    -
    -

    7 Notes

    +
    +

    7 Notes

    http-kit doesn’t support https so no need to bother with keystore stuff like you would with jetty. Just proxy from haproxy. @@ -1880,7 +2015,7 @@ http-kit doesn’t support https so no need to bother with keystore stuff li

    Author: Eric Ihli

    -

    Created: 2021-07-15 Thu 20:35

    +

    Created: 2021-07-20 Tue 16:38