Start adding WGU-related code

main
Eric Ihli 4 years ago
parent 71253e7a1c
commit 82a4d95c16

@ -0,0 +1,47 @@
#+TITLE: Capstone Documentation
* Documentation
D. Create each of the following forms of documentation for the product you have developed:
** Business Vision
Provide rhyming lyric suggestions optionally constrained by syllable count.
** Data Sets
See ~resources/darklyrics-markov.tpt~
** Data Analysis
See ~src/com/owoga/darklyrics/core.clj~
See https://github.com/eihli/prhyme
** Assessment
See visualization of rhyme suggestion in action.
See perplexity?
** Visualizations
See visualization of smoothing technique.
See wordcloud
** Accuracy
• assessment of the products accuracy
** Testing
• the results from the data product testing, revisions, and optimization based on the provided plans, including screenshots
** Source
• source code and executable file(s)
** Quick Start
• a quick start guide summarizing the steps necessary to install and use the product

@ -0,0 +1,195 @@
#+TITLE: WGU C964 Capstone - Eric Ihli
* Letter of Transmittal
Create a letter of transmittal and a project proposal to convince senior, non-technical managers and executives to implement the data product you have designed. The proposal should include each of the following:
June 18th, 2021
Eric Ihli
Owoga Industries
2579 Acadienne St.
Sulphur, LA 70663
Owoga Industries has a proven track record of turning data into value. The songwriting industry is due for gains in this area. The techniques of songwriters are still baked in the past. Pen and paper, mind-mapping, paper-bound thesauruses... Other industries have embraced the power of human-machine interaction. This power was demonstrated in the cyborg chess match of 2005 by Steven Cramton and Zackary Stephen, two chess amateurs, when they beat teams of grandmasters in a freestyle chess tournament that had no restrictions on the use of technological support.
We propose a tool that puts similar power in the hands of songwriters and artists. Machines have the power to parse millions of lines of lyrics to analyze countless more themes than humanly possible. Long and complex rhymes can be found in mere milliseconds.
The idea is simple. The potential is enormous.
Our machine learning model will analyze the existing lyrics of hundreds of thousands of songs. It will track how often it encounters sequences of words. For example: how often does "your" follow the word "throw"; how often does "hands" follow the two words "throw your"; how often does "up" follow the three words "throw your hands"; etc...
Once our model is trained on this frequency data, it can suggest completions for lyrics based on user input. Algorithms can filter the suggestions to match certain rhyme restrictions, syllable count restrictions, and more. The model can be trained both forwards and backwards so that it can predict that both "hands" follows the words "throw your" and also that "throw" preceds the suffix of words "your hands". In this manner, a user ask for a lyric completing a particular phrase and also ask for a lyric that ends with a particular rhyme.
These capabilities translate directly to profit. Songwriters will more quickly find the perfect rhyme, reducing costs. They'll also discover lyrics they otherwise wouldn't have thought of and listeners will appreciate the powerful result, increasing sales.
The advantages of the data model we've chosen (known as a Hidden Markov Model), is that it's fast to train. This will allow re-training on themed data sets, like pop, rock, or heavy metal. It also doesn't require expensive hardware which will save on both up-front and maintenance costs.
We can provide this tool at a cost of $xxx,xxx with your ongoing expense of $xxx per month to run the web server to interface with the tool.
We look forward to building your value,
Eric Ihli
Owoga Industries
eihli@owoga.com
** Problem Summary
Songwriting is difficult. There are over 170,000 words in the English language and the average native speaker's vocabulary is typically less than 50,000 words. https://capitalizemytitle.com/how-large-is-the-average-persons-vocabulary/
Existing tools are limited in their functionality. Thesauruses typically only provide one-word suggestions as similarities and rhyming dictionaries and thesauruses only provided limited types of rhymes. A thorough rhyming dictionary would be prohibitively large and difficult to print and carry around as it would have to cover the many different types of rhymes: assonance, consonance, dactyl meter, internal, feminine, and masculine, just to name a few.
** Product Benefits
The benefit of a computer tool is that it can be trained on extremely large and contemporary data sets. If the songwriter only wants recently-used pop-lyric related suggestions, that's just a few clicks away. This data model is also unhindered by slang. Machine learning models are great at estimating the pronunciation of words that don't appear in dictionaries, and users can add their own words and pronunciations to improve accuracy.
** Product Outline
The product will have a web-based graphical user interface. Songwriters can input parameters such as target rhyming words and number of desired syllables. The web interface will interact with the pre-trained data model to suggest words based on observed frequency from the training data set.
** Data Description
An initial data model will be provided. It will be generated in the heavy-metal theme. The training lyrics will be downloaded from the publicly accessible website at http://darklyrics.com.
** Objectives
The objective of this project is to provide a tool that songwriters can use to boost their creativity. The tool should be able to suggest clever rhymes that are close to grammatically correct (allowing for some artistic expression).
** Methodology
I will use a Hidden Markov Model. This will provide a representation of the training data set. From the model, I can calculate probabilities of a future state based on a current state.
The data collection process will involve web scraping and data cleaning. I'll remove non-English lyrics and unpronounceable words.
** Funding
We can begin work with no funding. Payment can be made on delivery of the product.
** Stakeholder Impact
For stakeholders at record labels, you advertise this tool to songwriters as a way to draw talent and reduce expense. Independent songwriter stakeholders can use this tool to enhance their resume and sell more work.
** Ethical and Legal Considerations
This tool will not use any sensitive data. The web server will be served over https, just in case someone accidentally types something sensitive in the lyric generation input field.
** Developer Expertise
My personal experience of writing songs/poetry combined with over 8 years of professional software development work is just part of what makes me perfectly suited to build this product. I also have experience writing libraries for memory-efficient data structures, a vital component of large Markov models.
* Executive Summary
** Opportunity
We will be building a tool based on a Hidden Markov Model of song lyrics to suggest new rhymes and lyrics to songwriters.
** Target Customer
*** Record Labels
Record labels can provide this tool to their existing songwriters to help them write better lyrics faster. This will save them money by reducing the hours spent on songwriting and it will increase profits by providing more and better songs to their listeners.
*** Songwriters
Independent songwriters can use this tool to optimize their time and compete with large record labels.
*** Students
Students can use this tool to help them brainstorm for poetry/literature classes.
** Existing Solutions
There is an existing tool, RhymeGenie (https://www.rhymegenie.com/), that sells for $24.95. It suffers the usability restriction of only being available for MacOS, Windows, and iOS. Our web-based tool will provide support for all of those platforms plus Android. We will also be able to provide users with instantaneous updates ape improvements without asking the user to install anything.
** Data Collection
Data collection will initially be performed by scraping http://darklyrics.com. Future data collection methods may involve importing the lyrics database available to record labels, scraping other lyrics websites, or using books provided by organizations like Project Gutenberg.
** Methodology
• the methodology you use to guide and support the data product design and development
** Deliverables
• deliverables associated with the design and development of the data product
** Implementation and Anticipated Outcomes
• the plan for implementation of your data product, including the anticipated outcomes from this development
** Validation and Verification
The ultimate validation must come from the use of the application. Do people use it? Academic numbers about how well the algorithm performs by some mathematical benchmark doesn't matter if nobody uses the product.
To validate the product, I'll track visits to the tool and have a feedback link so that users can tell me directly what they like and dislike.
** Costs
The model can be designed and trained on consumer hardware, so there is no cost there. The model can run on a server that can be obtained from the cloud for under $100/mo.
Development cost is $100/hour with an estimate of 120 hours for $12,000 total initial development cost.
** Timeline
+--------+------------+------------+----------------------------------------------------------+
|Sprint | Start Date | End Date |Tasks |
+--------+------------+------------+----------------------------------------------------------+
|0 | 2021/08/01 | 2021/08/04 |- Collect data |
| | | |- Clean data |
+--------+------------+------------+----------------------------------------------------------+
|1 |2021/08/04 |2021/08/07 |- Write training API |
| | | |- Train model |
+--------+------------+------------+----------------------------------------------------------+
|2 |2021/08/07 |2021/08/11 |- Evaluate and improve model |
+--------+------------+------------+----------------------------------------------------------+
|3 |2021/11/11 |2021/11/14 |- Add rhyme constraints |
| | | |- Add syllabification constraints |
+--------+------------+------------+----------------------------------------------------------+
|4 |2021/11/14 |2021/08/18 |- Build web interface |
+--------+------------+------------+----------------------------------------------------------+
|5 |2021/08/18 |2021/08/21 |- QA |
| | | |- Test |
| | | |- Fix bugs |
+--------+------------+------------+----------------------------------------------------------+
* Documentation
D. Create each of the following forms of documentation for the product you have developed:
** Business Vision
• a business vision or business requirements document
** Data Sets
• raw and cleaned data sets with the code and executable files used to scrape and clean data (if applicable)
** Data Analysis
• code used to perform the analysis of the data and construct a descriptive, predictive, or prescriptive data product
** Assessment
• assessment of the hypotheses for acceptance or rejection
** Visualizations
• visualizations and elements of effective storytelling supporting the data exploration and preparation, data analysis, and data summary, including the phenomenon and its detection
** Accuracy
• assessment of the products accuracy
** Testing
• the results from the data product testing, revisions, and optimization based on the provided plans, including screenshots
** Source
• source code and executable file(s)
** Quick Start
• a quick start guide summarizing the steps necessary to install and use the product

@ -61,6 +61,8 @@
(repl/halt))
(comment
(require '[clojure.java.jdbc :as sql])
(let [db (-> state/system :com.darklimericks.db.core/connection)
session (java.util.UUID/fromString "47e25213-6cd7-493d-a92a-b5bae635c8f4")]
(db.limericks/limericks-by-session db session))
@ -75,7 +77,9 @@
(db.limericks/limericks-by-session
(-> state/system :com.darklimericks.db.core/connection)
session))
(init)
(let [db (-> state/system :database.sql/connection)
albums (db.albums/most-recent-albums db)]
(->> albums
@ -105,6 +109,19 @@
(-> state/system :database.sql/connection)
(-> state/system :app/cache))]
(handler {:params {:scheme "A9 A9 B5 B5 A9" #_'((A 9) (A 9) (B 5) (B 5) (A 9))}}))))
;; If the namespace gets dirty, this can clear it up.
(run!
#(ns-unalias (find-ns 'user) %)
(keys (ns-aliases 'user)))
;; Making a request from the REPL
(let [handler (handlers/show-rhyme-suggestion
(-> state/system :com.darklimericks.db.core/connection)
(-> state/system :com.darklimericks.kv.core/connection))
router (state/system :com.darklimericks.server.router/router)]
(handler {:params {:rhyme-target "foo"}
::reitit/router router}))
(db.albums/num-albums
(-> state/system :database.sql/connection))

@ -1,55 +0,0 @@
(ns com.darklimericks.server.example
(:require [integrant.core :as ig]
[clojure.tools.namespace.repl :refer [set-refresh-dirs]]
[integrant.repl :as repl]
[org.httpkit.server :as kit]
[reitit.http :as http]
[hiccup.core :as hiccup]
[hiccup.page :as page]
[taoensso.timbre :as timbre]
[reitit.interceptor.sieppari :as sieppari]))
(defn home []
(page/html5
[:head
[:meta {:charset "utf-8"}]
[:meta {:name "viewport" :content "width=device-width, initial-scale=1.0"}]]
[:title "Hello World"]
[:body "Goodbye, world!"]))
(defn home-handler [request]
{:status 200
:headers {"Content-Type" "text/html; charset=utf-8"}
:body (hiccup/html (home))})
(def routes
[["/" {:name ::home
:get {:handler home-handler}}]])
(def config
{:app/handler {:router (ig/ref :app/router)}
:app/router {:routes routes}
:app/server {:port 8000 :handler (ig/ref :app/handler)}})
(defmethod ig/init-key :app/router [_ {:keys [routes]}]
(http/router routes))
(defmethod ig/init-key :app/handler [_ {:keys [router]}]
(http/ring-handler router {:executor sieppari/executor}))
(defmethod ig/init-key :app/server [_ opts]
(timbre/info "Starting server with " opts)
(kit/run-server (:handler opts) (dissoc opts :handler)))
(defmethod ig/halt-key! :app/server [_ server]
(timbre/info "Stopping server")
(server))
(comment
(set-refresh-dirs "src" "dev")
(repl/set-prep! (constantly config))
(repl/prep)
(repl/go)
(repl/reset)
(repl/halt)
)

@ -235,7 +235,7 @@
java.util.UUID/fromString)
limericks (db.limericks/limericks-by-session db session-key)]
{:status 200
:headers {"Content-Type" "text/html; charset=uft-8"}
:headers {"Content-Type" "text/html; charset=utf-8"}
:body (views/wrapper
db
request
@ -249,3 +249,23 @@
request
{}
(views/submit-limericks request []))})))
(defn wgu [db cache]
(fn [request]
{:status 200
:headers {"Content-Type" "text/html; charset=utf-8"}
:body (views/wrapper
db
request
{}
(views/wgu request))}))
(defn show-rhyme-suggestion [db cache]
(fn [request]
{:status 201
:headers {"Content-Type" "text/html; charset=utf-8"}
:body (views/wrapper
db
request
{}
(views/show-rhyme-suggestion request))}))

@ -32,7 +32,11 @@
:coercion reitit.coercion.spec/coercion
:parameters {:path {:artist-id int?}}
:get {:handler (handlers/artist-get-handler db)}}]]
["/assets/*" handlers/resource-handler]]]
["/assets/*" handlers/resource-handler]
["/wgu"
{:name ::wgu
:get {:handler (handlers/wgu db cache)}
:post {:handler (handlers/show-rhyme-suggestion db cache)}}]]]
(timbre/info "Starting router.")
(http/router
routes

@ -186,3 +186,28 @@
[:p
(for [line (string/split (:limerick/text limerick) #"\n")]
[:div line])]]))])
(defn wgu
[request]
[:div
[:h1 "WGU Capstone"]
(form/form-to
[:post (util/route-name->path
request
:com.darklimericks.server.router/wgu)]
(form/label
"rhyme-target"
"Target word or phrase for which to find rhyme suggestions")
" "
(form/text-field
{:placeholder "instead of war on poverty"}
"rhyme-target")
(form/submit-button
{:class "ml2"}
"Show rhyme suggestions"))])
(defn show-rhyme-suggestion
[request]
[:div
(wgu request)
[:div "Hi"]])

Loading…
Cancel
Save