From dadbb78c3784ed52650b71a2d12706a0e2d56ee8 Mon Sep 17 00:00:00 2001 From: Eric Ihli Date: Fri, 23 Jul 2021 16:05:30 -0500 Subject: [PATCH] Update README with citations --- web/README_WGU.org | 44 ++- web/resources/public/README_WGU.htm | 480 +++++++++++++++------------- 2 files changed, 298 insertions(+), 226 deletions(-) diff --git a/web/README_WGU.org b/web/README_WGU.org index 6b71fdb..ee27414 100644 --- a/web/README_WGU.org +++ b/web/README_WGU.org @@ -93,7 +93,7 @@ This software will accomplish its primary objective if it makes its way into the Several secondary objectives are also desirable and reasonably expected. The architecture of the software lends itself to existing as several independently useful modules. -For example, the [[https://en.wikipedia.org/wiki/Hidden_Markov_model][Markov Model]] can be conveniently backed by a [[https://en.wikipedia.org/wiki/Trie][Trie data structure]]. This Trie data structure can be released as its own software package and used any application that benefits from prefix matching. +For example, the [[https://en.wikipedia.org/wiki/Hidden_Markov_model][Markov Model]] (Markov Model 2021) can be conveniently backed by a [[https://en.wikipedia.org/wiki/Trie][Trie data structure]] (Trie 2021). This Trie data structure can be released as its own software package and used any application that benefits from prefix matching. Another example is the package that turns phrases into phones (symbols of pronunciation). That package can find use for a number of natural language processing and natural language generation tasks, aside from the task required by this particular project. @@ -130,9 +130,9 @@ The only stakeholders in the project will be the record labels or songwriters. I ** Ethical And Legal Considerations -Web scraping, the method used to obtain the initial dataset from http://darklyrics.com, is protected given the ruling in [[https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn]]. +Web scraping, the method used to obtain the initial dataset from http://darklyrics.com, is protected given the ruling in [[https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn]] (HiQ Labs v. LinkedIn 2021). -The use of publicly available data in generative works is less clear. But Microsoft's lawyers deemed it sound given their recent release of Github CoPilot ([[https://www.theverge.com/2021/7/7/22561180/github-copilot-legal-copyright-fair-use-public-code]]). +The use of publicly available data in generative works is less clear. But Microsoft's lawyers deemed it sound given their recent release of Github CoPilot (Gershgorn, 2021). ** Expertise @@ -187,7 +187,7 @@ Each new model can be uploaded to the web server and users can select which mode RhymeStorm™ development will proceed with an iterative Agile methodology. It will be composed of several independent modules that can be worked on independently, in parallel, and iteratively. -The Trie data structure that will be used as a backing to the Hidden Markov Model can be worked on in isolation from any other aspect of the project. The first iteration can use a simple hash-map as a backing store. The second iteration can improve memory efficiency by using a ByteBuffer as a [[https://aclanthology.org/W09-1505.pdf][Tightly Packed Trie]]. Future iterations can continue to improve performance metrics. +The Trie data structure that will be used as a backing to the Hidden Markov Model can be worked on in isolation from any other aspect of the project. The first iteration can use a simple hash-map as a backing store. The second iteration can improve memory efficiency by using a ByteBuffer as a [[https://aclanthology.org/W09-1505.pdf][Tightly Packed Trie]] (Germann et al., 2009) Future iterations can continue to improve performance metrics. The web server can be implemented initially without security measures like HTTPS and performance measures like load balancing. Future iterations can add these features as they become necessary. @@ -345,6 +345,8 @@ The dataset currently in use was generated from the publicly available lyrics at Further datasets will need to be provided by the end-user. +The trained dataset is available as a resource in this repository at ~web/resources/models/~. + ** Decision Support Functionality *** Choosing Words For A Lyric Based On Markov Likelihood @@ -651,7 +653,7 @@ The code sample below demonstrates training a Hidden Markov Model on a set of ly It also performs compaction and serialization. Song lyrics are typically provided as text files. Reading files on a hard drive is an expensive process, but we can perform that expensive training process only once and save the resulting Markov Model in a more memory-efficient format. -#+begin_src clojure :session main :results output pp +#+begin_src clojure :session main :results output pp :cache yes :eval no-export (require '[com.owoga.corpus.markov :as markov] '[taoensso.nippy :as nippy] '[com.owoga.prhyme.data-transform :as data-transform] @@ -712,7 +714,7 @@ It also performs compaction and serialization. Song lyrics are typically provide [(string/join " " (map database ngram-ids)) freq])))) #+end_src -#+RESULTS: +#+RESULTS[4ee2ce5a73756ffbd11253187af68b4a3e6cd324]: #+begin_example Froze /tmp/markov-trie-4-gram-backwards.bin Froze /tmp/markov-database-4-gram-backwards.bin @@ -738,6 +740,7 @@ Successfully loaded trie and database. #+end_example + ** Functionalities To Evaluate The Accuracy Of The Data Product Since creative brainstorming is the goal, "accuracy" is subjective. @@ -747,13 +750,14 @@ We can, however, measure and compare language generation algorithms against how #+begin_src clojure :session main :exports both :results output pp (require '[taoensso.nippy :as nippy] '[com.owoga.tightly-packed-trie :as tpt] - '[com.owoga.corpus.markov :as markov]) + '[com.owoga.corpus.markov :as markov] + '[clojure.java.io :as io]) -(def database (nippy/thaw-from-file "/home/eihli/.models/markov-database-4-gram-backwards.bin")) +(def database (nippy/thaw-from-file (io/resource "models/markov-database-4-gram-backwards.bin"))) (def markov-tight-trie (tpt/load-tightly-packed-trie-from-file - "/home/eihli/.models/markov-tightly-packed-trie-4-gram-backwards.bin" + (io/resource "models/markov-tightly-packed-trie-4-gram-backwards.bin") (markov/decode-fn database))) (let [likely-phrase ["a" "hole" "" ""] @@ -867,7 +871,7 @@ In the interest of being nice to the owners of http://darklyrics.com, I'm keepin The trained data model is available. -See ~resources/darklyrics-markov.tpt~ +See ~web/resources/models/~ ** Data Analysis @@ -1085,3 +1089,23 @@ This application is not publicly available. I'll upload it with submission of th 3. Navigate to the root directory of this git repo and run ~java -jar darklimericks.jar~ 4. Visit http://localhost:8000/wgu + +* Citations + +Wikimedia Foundation. (2021, July 16). Markov Model. Wikipedia. + https://en.wikipedia.org/wiki/Markov_model. + +Wikimedia Foundation. (2021, June 25). Trie. Wikipedia. + https://en.wikipedia.org/wiki/Trie. + +Wikimedia Foundation. (2021, June 15). HiQ Labs v. LinkedIn. Wikipedia. + https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn. + +Gershgorn, D. (2021, July 7). GitHub's automatic coding tool rests on untested + legal ground. The Verge. + https://www.theverge.com/2021/7/7/22561180/github-copilot-legal-copyright-fair-use-public-code. + +Ulrich Germann, Eric Joanis, and Samuel Larkin. 2009. Tightly packed tries: How + to fit large models into memory, and make them load fast, too. Proceedings of + the Workshop on Software Engineering, Testing, and Quality Assurance for Natural + Language (SETQA- NLP 2009), pages 31–39 diff --git a/web/resources/public/README_WGU.htm b/web/resources/public/README_WGU.htm index 6d430d5..4a35f20 100644 --- a/web/resources/public/README_WGU.htm +++ b/web/resources/public/README_WGU.htm @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> - + RhymeStorm™ - WGU CSCI Capstone Project @@ -223,121 +223,122 @@

Table of Contents

-
-

1 WGU Evaluator Notes

+
+

1 WGU Evaluator Notes

Hello! I hope you enjoy your time with this evaluation! @@ -361,20 +362,20 @@ After I describe the steps to initialize a development environment, you’ll

-
-

2 Evaluation Technical Documentation

+
+

2 Evaluation Technical Documentation

It’s probably not necessary for you to replicate my development environment in order to evaluate this project. You can access the deployed application at https://darklimericks.com/wgu and the libraries and supporting code that I wrote for this project at https://github.com/eihli/clj-tightly-packed-trie, https://github.com/eihli/syllabify, and https://github.com/eihli/prhyme. The web server and web application is not hosted publicly but you will find it uploaded with my submission as a .tar archive.

-
-

2.1 How To Initialize Development Environment

+
+

2.1 How To Initialize Development Environment

-
-

2.1.1 Required Software

+
+

2.1.1 Required Software

  • Docker
  • @@ -384,8 +385,8 @@ It’s probably not necessary for you to replicate my development environmen
-
-

2.1.2 Steps

+
+

2.1.2 Steps

  1. Run ./db/run.sh && ./kv/run.sh to start the docker containers for the database and key-value store. @@ -399,12 +400,12 @@ It’s probably not necessary for you to replicate my development environmen
-
-

2.2 How To Run Software Locally

+
+

2.2 How To Run Software Locally

-
-

2.2.1 Requirements

+
+

2.2.1 Requirements

  • Java
  • @@ -413,8 +414,8 @@ It’s probably not necessary for you to replicate my development environmen
-
-

2.2.2 Steps

+
+

2.2.2 Steps

  1. Run ./db/run.sh && ./kv/run.sh to start the docker containers for the database and key-value store. @@ -435,8 +436,8 @@ It’s probably not necessary for you to replicate my development environmen
    -
    -

    3.1 Problem Summary

    +
    +

    3.1 Problem Summary

    Songwriters, artists, and record labels can save time and discover better lyrics with the help of a machine learning tool that supports their creative endeavours. @@ -448,8 +449,8 @@ Songwriters have several old-fashioned tools at their disposal including diction

    -
    -

    3.2 Benefits

    +
    +

    3.2 Benefits

    How many sensible phrases can you think of that rhyme with “war on poverty”? What if I say that there’s a restriction to only come up with phrases that are exactly 14 syllables? That’s a common restriction when a songwriter is trying to match the meter of a previous line. What if I add another restriction that there must be primary stress at certain spots in that 14 syllable phrase? @@ -465,8 +466,8 @@ And this is a process that is perfect for machine learning. Machine learning can

    -
    -

    3.3 Product - RhymeStorm™

    +
    +

    3.3 Product - RhymeStorm™

    RhymeStorm™ is a tool to help songwriters brainstorm. It provides lyrics automatically generated based on training data from existing songs while adhering to restrictions based on rhyme scheme, meter, genre, and more. @@ -494,8 +495,8 @@ This auto-complete functionality will be similar to the auto-complete that is co

    -
    -

    3.4 Data

    +
    +

    3.4 Data

    The initial model will be trained on the lyrics from http://darklyrics.com. This is a publicly available data set with minimal meta-data. Record labels will have more valuable datasets that will include meta-data along with lyrics, such as the date the song was popular, the number of radio plays of the song, the profit of the song/artist, etc… @@ -507,8 +508,8 @@ The software can be augmented with additional algorithms to account for the type

    -
    -

    3.5 Objectives

    +
    +

    3.5 Objectives

    This software will accomplish its primary objective if it makes its way into the daily toolkit of a handful of singers/songwriters. @@ -519,7 +520,7 @@ Several secondary objectives are also desirable and reasonably expected. The arc

    -For example, the Markov Model can be conveniently backed by a Trie data structure. This Trie data structure can be released as its own software package and used any application that benefits from prefix matching. +For example, the Markov Model (Markov Model 2021) can be conveniently backed by a Trie data structure (Trie 2021). This Trie data structure can be released as its own software package and used any application that benefits from prefix matching.

    @@ -528,8 +529,8 @@ Another example is the package that turns phrases into phones (symbols of pronun

    -
    -

    3.6 Development Methodology - Agile

    +
    +

    3.6 Development Methodology - Agile

    This project will be developed with an iterative Agile methodology. Since a large part of data science and machine learning is exploration, this project will benefit from ongoing exploration in tandem with development. @@ -545,8 +546,8 @@ The prices quoted below are for an initial minimum-viable-product that will serv

    -
    -

    3.7 Costs

    +
    +

    3.7 Costs

    Funding requirements are minimal. The initial dataset is public and freely available. On a typical consumer laptop, Hidden Markov Models can be trained on fairly large datasets in short time and the training doesn’t require the use of expensive hardware like the GPUs used to train Deep Neural Networks. @@ -630,30 +631,30 @@ These are my estimates for the time and cost of different aspects of initial dev

    -
    -

    3.8 Stakeholder Impact

    +
    +

    3.8 Stakeholder Impact

    -The only stakeholders in the project will be the record labels or songwriters. I describe the only impact to them in the 3.2 section above. +The only stakeholders in the project will be the record labels or songwriters. I describe the only impact to them in the 3.2 section above.

    -
    -

    3.9 Ethical And Legal Considerations

    +
    +

    3.9 Ethical And Legal Considerations

    -Web scraping, the method used to obtain the initial dataset from http://darklyrics.com, is protected given the ruling in https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn. +Web scraping, the method used to obtain the initial dataset from http://darklyrics.com, is protected given the ruling in https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn (HiQ Labs v. LinkedIn 2021).

    -The use of publicly available data in generative works is less clear. But Microsoft’s lawyers deemed it sound given their recent release of Github CoPilot (https://www.theverge.com/2021/7/7/22561180/github-copilot-legal-copyright-fair-use-public-code). +The use of publicly available data in generative works is less clear. But Microsoft’s lawyers deemed it sound given their recent release of Github CoPilot (Gershgorn, 2021).

    -
    -

    3.10 Expertise

    +
    +

    3.10 Expertise

    I have 10 years experience as a programmer and have worked extensively on both frontend technologies like HTML/JavaScript, backend technologies like Django, and building libraries/packages/frameworks. @@ -674,8 +675,8 @@ Write an executive summary directed to IT professionals that addresses each of t

    -
    -

    4.1 Decision Support Opportunity

    +
    +

    4.1 Decision Support Opportunity

    Songwriters expend a lot of time and effort finding the perfect rhyming word or phrase. RhymeStorm™ is going to amplify user’s creative abilities by searching its machine learning model for sensible and proven-successful words and phrases that meet the rhyme scheme and meter requirements requested by the user. @@ -687,8 +688,8 @@ When a songwriter needs to find likely phrases that rhyme with “war on pov

    -
    -

    4.2 Customer Needs And Product Description

    +
    +

    4.2 Customer Needs And Product Description

    Songwriters spend money on dictionaries, compilations of slang, thesauruses, and phrase dictionaries. They spend their time daydreaming, brainstorming, contemplating, and mixing and matching the knowledge they acquire through these traditional means. @@ -708,8 +709,8 @@ Computers can process and sort this information and sort the results by quality

    -
    -

    4.3 Existing Products

    +
    +

    4.3 Existing Products

    We’re all familiar with dictionaries, thesauruses, and their shortcomings. @@ -725,8 +726,8 @@ RhymeZone is limited in its capability. It doesn’t do well finding rhymes

    -
    -

    4.4 Available Data And Future Data Lifecycle

    +
    +

    4.4 Available Data And Future Data Lifecycle

    The initial dataset will be gathered by downloading lyrics from http://darklyrics.com and future models can be generated by downloading lyrics from other websites. Alternatively, data can be provided by record labels and combined with meta-data that the record label may have, such as how many radio plays each song gets and how much profit they make from each song. @@ -750,15 +751,15 @@ Each new model can be uploaded to the web server and users can select which mode

    -
    -

    4.5 Methodology - Agile

    +
    +

    4.5 Methodology - Agile

    RhymeStorm™ development will proceed with an iterative Agile methodology. It will be composed of several independent modules that can be worked on independently, in parallel, and iteratively.

    -The Trie data structure that will be used as a backing to the Hidden Markov Model can be worked on in isolation from any other aspect of the project. The first iteration can use a simple hash-map as a backing store. The second iteration can improve memory efficiency by using a ByteBuffer as a Tightly Packed Trie. Future iterations can continue to improve performance metrics. +The Trie data structure that will be used as a backing to the Hidden Markov Model can be worked on in isolation from any other aspect of the project. The first iteration can use a simple hash-map as a backing store. The second iteration can improve memory efficiency by using a ByteBuffer as a Tightly Packed Trie (Germann et al., 2009) Future iterations can continue to improve performance metrics.

    @@ -775,8 +776,8 @@ Much of data science is exploratory and taking an iterative Agile approach can t

    -
    -

    4.6 Deliverables

    +
    +

    4.6 Deliverables

    • Supporting libraries source code
    • @@ -810,8 +811,8 @@ The trained data model and web interface has been deployed at the following addr
    -
    -

    4.7 Implementation Plan And Anticipations

    +
    +

    4.7 Implementation Plan And Anticipations

    I’ll start by writing and releasing the supporting libraries and packages: Tries, Syllabification/Phonetics, Rhyming. @@ -831,8 +832,8 @@ In anticipation of user growth, I’ll be deploying the final product on Dig

    -
    -

    4.8 Requirements Validation And Verification

    +
    +

    4.8 Requirements Validation And Verification

    the methods for validating and verifying that the developed data product meets the requirements and subsequently the needs of the customers @@ -852,8 +853,8 @@ The final website will integrate multiple technologies and the integrations won&

    -
    -

    4.9 Programming Environments And Costs

    +
    +

    4.9 Programming Environments And Costs

    the programming environments and any related costs, as well as the human resources that are necessary to execute each phase in the development of the data product @@ -877,8 +878,8 @@ All code was written and all models were trained on a Lenovo T15G with an Intel

    -
    -

    4.10 Timeline And Milestones

    +
    +

    4.10 Timeline And Milestones

    @@ -956,16 +957,16 @@ RhymeStorm™ is an application to help singers and songwriters brainstorm new l

    -
    -

    5.1 Descriptive And Predictive Methods

    +
    +

    5.1 Descriptive And Predictive Methods

    -
    -

    5.1.1 Descriptive Method

    +
    +

    5.1.1 Descriptive Method

      -
    1. Most Common Grammatical Structures In A Set Of Lyrics
      +
    2. Most Common Grammatical Structures In A Set Of Lyrics

      By filtering songs by metrics such as popularity, number of awards, etc… we can use this software package to determine the most common grammatical phrase structure for different filtered categories. @@ -1042,12 +1043,12 @@ In the example below, you’ll see that a simple noun-phrase is the most pop

    -
    -

    5.1.2 Prescriptive Method

    +
    +

    5.1.2 Prescriptive Method

      -
    1. Most Likely Word To Follow A Given Phrase
      +
    2. Most Likely Word To Follow A Given Phrase

      To help songwriters think of new lyrics, we provide an API to receive a list of words that commonly follow/precede a given phrase. @@ -1143,8 +1144,8 @@ In the example below, we provide a seed suffix of “bother me” and as

    -
    -

    5.2 Datasets

    +
    +

    5.2 Datasets

    The dataset currently in use was generated from the publicly available lyrics at http://darklyrics.com. @@ -1153,15 +1154,19 @@ The dataset currently in use was generated from the publicly available lyrics at

    Further datasets will need to be provided by the end-user.

    + +

    +The trained dataset is available as a resource in this repository at web/resources/models/. +

    -
    -

    5.3 Decision Support Functionality

    +
    +

    5.3 Decision Support Functionality

    -
    -

    5.3.1 Choosing Words For A Lyric Based On Markov Likelihood

    +
    +

    5.3.1 Choosing Words For A Lyric Based On Markov Likelihood

    Entire phrases can be generated using the previously mentioned functionality of generating lists of likely prefix/suffix words. @@ -1177,8 +1182,8 @@ The user can supply criteria such as restrictions on the number of syllables, nu

    -
    -

    5.3.2 Choosing Words To Complete A Lyric Based On Rhyme Quality

    +
    +

    5.3.2 Choosing Words To Complete A Lyric Based On Rhyme Quality

    Another part of the decision support functionality is filtering and ordering predicted words based on their rhyme quality. @@ -1404,8 +1409,8 @@ In the example below, you’ll see that the first 20 or so rhymes are perfec

    -
    -

    5.4 Featurizing, Parsing, Cleaning, And Wrangling Data

    +
    +

    5.4 Featurizing, Parsing, Cleaning, And Wrangling Data

    The data processing code is in https://github.com/eihli/prhyme @@ -1441,8 +1446,8 @@ words can be compared: “Foo” is the same as “foo”.

    -
    -

    5.5 Data Exploration And Preparation

    +
    +

    5.5 Data Exploration And Preparation

    The primary data structure and algorithms supporting exploration of the data are a Markov Trie @@ -1490,8 +1495,8 @@ All Trie code is hosted in the git repo located at -

    5.6 Data Visualization Functionalities For Data Exploration And Inspection

    +
    +

    5.6 Data Visualization Functionalities For Data Exploration And Inspection

    The functionality to explore and visualize data is baked into the Trie data structure. @@ -1501,7 +1506,7 @@ The functionality to explore and visualize data is baked into the Trie data stru By simply viewing the Trie in a Clojure REPL, you can inspect the Trie’s structure.

    -
    +
       (let [initialized-trie (->> (trie/make-trie "dog" "dog" "dot" "dot" "do" "do"))]
         initialized-trie)
         ;; => {(\d \o \g) "dog", (\d \o \t) "dot", (\d \o) "do", (\d) nil}
    @@ -1543,12 +1548,12 @@ The Hidden Markov Model data structure doesn’t lend itself to any useful g
     
    -
    -

    5.7 Implementation Of Interactive Queries

    +
    +

    5.7 Implementation Of Interactive Queries

    -
    -

    5.7.1 Generate Rhyming Lyrics

    +
    +

    5.7.1 Generate Rhyming Lyrics

    This interactive query will return a list of rhyming phrases to any word or phrase you enter. @@ -1691,8 +1696,8 @@ The interactive query for the above can be found at -

    5.7.2 Complete Lyric Containing Suffix

    +
    +

    5.7.2 Complete Lyric Containing Suffix

    -
    -

    5.9 Functionalities To Evaluate The Accuracy Of The Data Product

    + +
    +

    5.9 Functionalities To Evaluate The Accuracy Of The Data Product

    Since creative brainstorming is the goal, “accuracy” is subjective. @@ -1968,13 +1974,14 @@ We can, however, measure and compare language generation algorithms against how

    (require '[taoensso.nippy :as nippy]
              '[com.owoga.tightly-packed-trie :as tpt]
    -         '[com.owoga.corpus.markov :as markov])
    +         '[com.owoga.corpus.markov :as markov]
    +         '[clojure.java.io :as io])
     
    -(def database (nippy/thaw-from-file "/home/eihli/.models/markov-database-4-gram-backwards.bin"))
    +(def database (nippy/thaw-from-file (io/resource "models/markov-database-4-gram-backwards.bin")))
     
     (def markov-tight-trie
       (tpt/load-tightly-packed-trie-from-file
    -   "/home/eihli/.models/markov-tightly-packed-trie-4-gram-backwards.bin"
    +   (io/resource "models/markov-tightly-packed-trie-4-gram-backwards.bin")
        (markov/decode-fn database)))
     
     (let [likely-phrase ["a" "hole" "</s>" "</s>"]
    @@ -2003,7 +2010,14 @@ We can, however, measure and compare language generation algorithms against how
     
    -class clojure.lang.Compiler$CompilerException
    +"a" has preceeded "hole" "</s>" "</s>" a total of 250 times
    +"this" has preceeded "hole" "</s>" "</s>" a total of 173 times
    +"that" has preceeded "hole" "</s>" "</s>" a total of 45 times
    +-12.184088569934774 is the perplexity of "a" "hole" "</s>" "</s>"
    +-12.552930899563904 is the perplexity of "this" "hole" "</s>" "</s>"
    +-13.905719644461469 is the perplexity of "that" "hole" "</s>" "</s>"
    +
    +
     
    @@ -2022,8 +2036,8 @@ This standardized measure of accuracy can be used to compare different language
    -
    -

    5.10 Security Features

    +
    +

    5.10 Security Features

    Artists/Songwriters place a lot of value in the secrecy of their content. Therefore, all communication with the web-based interface occurs over a secure connection using HTTPS. @@ -2039,15 +2053,15 @@ With this precaution in place, attackers will not be able to snoop the content t

    -
    -

    5.11 Tools To Monitor And Maintain The Product

    +
    +

    5.11 Tools To Monitor And Maintain The Product

    By having the application server behind an HAProxy load balancer, we can take advantage of the built-in HAProxy stats page for monitoring amount of traffic and health of the application servers.

    -
    +

    stats.png

    @@ -2066,8 +2080,8 @@ The server also includes the certbot script for updating and mainta
    -
    -

    5.12 A User-Friendly, Functional Dashboard That Includes At Least Three Visualization Types

    +
    +

    5.12 A User-Friendly, Functional Dashboard That Includes At Least Three Visualization Types

    You can access an example of the user interface at https://darklimericks.com/wgu. @@ -2086,7 +2100,7 @@ The first visualization is a scatter plot of rhyming words with the “quali

    -
    +

    wgu-vis.png

    @@ -2096,7 +2110,7 @@ The second visualization is a word cloud where the size of each word is based on

    -
    +

    wgu-vis-cloud.png

    @@ -2106,7 +2120,7 @@ The third visualization is a table that lists all of the rhymes, their pronuncia

    -
    +

    wgu-vis-table.png

    @@ -2122,16 +2136,16 @@ Create each of the following forms of documentation for the product you have dev

    -
    -

    6.1 Business Vision

    +
    +

    6.1 Business Vision

    Provide rhyming lyric suggestions optionally constrained by syllable count.

    -
    -

    6.1.1 Requirements

    +
    +

    6.1.1 Requirements

    • [X] Given a word or phrase, suggest rhymes (ranked by quality) (Trie)
    • @@ -2147,8 +2161,8 @@ Provide rhyming lyric suggestions optionally constrained by syllable count.
    -
    -

    6.2 Data Sets

    +
    -
    -

    6.3 Data Analysis

    +
    +

    6.3 Data Analysis

    I wrote code to perform certain types of data analysis, but I didn’t find it useful to meet the business requirements of this project. @@ -2185,8 +2199,8 @@ For example, there is natural language processing code at -

    6.4 Assessment Of Hypothesis

    +
    +

    6.4 Assessment Of Hypothesis

    I’ll use an example output to subjectively assess the results of the project. @@ -2402,31 +2416,31 @@ and more.

    -
    -

    6.5 Visualizations

    +
    +

    6.5 Visualizations

    -
    +

    rhyme-scatterplot.png

    -
    +

    wordcloud.png

    -
    +

    rhyme-table.png

    -
    -

    6.6 Accuracy

    +
    +

    6.6 Accuracy

    It’s difficult to objectively test the models accuracy since the goal of “brainstorm new lyric” is such a subjective goal. A valid test of that goal will require many human subjects to subjectively evaluate their performance while using the tool compared to their performance without the tool. @@ -2437,8 +2451,8 @@ If we allow ourselves the assumption that the close a generated phrase is to a v

    -
    -

    6.6.1 Percentage Of Generated Lines That Are Valid English Sentences

    +
    -
    -

    6.7 Testing

    +
    +

    6.7 Testing

    My language of choice for this project encourages a programming technique or paradigm known as REPL-driven development. REPL stands for Read-Eval-Print-Loop. This is a way to write and test code in real-time without a compilation step. Individual code chunks can be evaluated inside an editor, resulting in rapid feedback. @@ -2561,12 +2575,12 @@ Here is an example of the test suite for the code related to syllabification:

    -
    -

    6.8 Source Code

    +
    +

    6.8 Source Code

    -
    -

    6.8.1 Tightly Packed Trie

    +
    +

    6.8.1 Tightly Packed Trie

    This is the data structure that backs the Hidden Markov Model. @@ -2578,8 +2592,8 @@ This is the data structure that backs the Hidden Markov Model.

    -
    -

    6.8.2 Phonetics

    +
    +

    6.8.2 Phonetics

    This is the helper library that syllabifies and manipulates words, phones, and syllables. @@ -2591,8 +2605,8 @@ This is the helper library that syllabifies and manipulates words, phones, and s

    -
    -

    6.8.3 Rhyming

    +
    +

    6.8.3 Rhyming

    This library contains code for analyzing rhymes, sentence structure, and manipulating corpuses. @@ -2604,8 +2618,8 @@ This library contains code for analyzing rhymes, sentence structure, and manipul

    -
    -

    6.8.4 Web Server And User Interface

    +
    +

    6.8.4 Web Server And User Interface

    This application is not publicly available. I’ll upload it with submission of the project. @@ -2614,16 +2628,16 @@ This application is not publicly available. I’ll upload it with submission

    -
    -

    6.9 Quick Start

    +
    +

    6.9 Quick Start

    -
    -

    6.9.1 How To Initialize Development Environment

    +
    +

    6.9.1 How To Initialize Development Environment

      -
    1. Required Software
      +
    2. Required Software
      • Docker
      • @@ -2633,7 +2647,7 @@ This application is not publicly available. I’ll upload it with submission
    3. -
    4. Steps
      +
    5. Steps
      1. Run ./db/run.sh && ./kv/run.sh to start the docker containers for the database and key-value store. @@ -2648,12 +2662,12 @@ This application is not publicly available. I’ll upload it with submission
      -
      -

      6.9.2 How To Run Software Locally

      +
      +

      6.9.2 How To Run Software Locally

        -
      1. Requirements
        +
      2. Requirements
        • Java
        • @@ -2662,7 +2676,7 @@ This application is not publicly available. I’ll upload it with submission
      3. -
      4. Steps
        +
      5. Steps
        1. Run ./db/run.sh && ./kv/run.sh to start the docker containers for the database and key-value store. @@ -2679,10 +2693,44 @@ This application is not publicly available. I’ll upload it with submission
      + + +
      +

      7 Citations

      +
      +

      +Wikimedia Foundation. (2021, July 16). Markov Model. Wikipedia. + https://en.wikipedia.org/wiki/Markov_model. +

      + +

      +Wikimedia Foundation. (2021, June 25). Trie. Wikipedia. + https://en.wikipedia.org/wiki/Trie. +

      + +

      +Wikimedia Foundation. (2021, June 15). HiQ Labs v. LinkedIn. Wikipedia. + https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn. +

      + +

      +Gershgorn, D. (2021, July 7). GitHub’s automatic coding tool rests on untested + legal ground. The Verge. + https://www.theverge.com/2021/7/7/22561180/github-copilot-legal-copyright-fair-use-public-code. +

      + +

      +Ulrich Germann, Eric Joanis, and Samuel Larkin. 2009. Tightly packed tries: How + to fit large models into memory, and make them load fast, too. Proceedings of + the Workshop on Software Engineering, Testing, and Quality Assurance for Natural + Language (SETQA- NLP 2009), pages 31–39 +

      +
      +

    Author: Eric Ihli

    -

    Created: 2021-07-22 Thu 20:04

    +

    Created: 2021-07-23 Fri 16:05