You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

11 KiB

WGU C964 Capstone - Eric Ihli

Letter of Transmittal

Create a letter of transmittal and a project proposal to convince senior, non-technical managers and executives to implement the data product you have designed. The proposal should include each of the following:

June 18th, 2021

Eric Ihli Owoga Industries 2579 Acadienne St. Sulphur, LA 70663

Owoga Industries has a proven track record of turning data into value. The songwriting industry is due for gains in this area. The techniques of songwriters are still baked in the past. Pen and paper, mind-mapping, paper-bound thesauruses… Other industries have embraced the power of human-machine interaction. This power was demonstrated in the cyborg chess match of 2005 by Steven Cramton and Zackary Stephen, two chess amateurs, when they beat teams of grandmasters in a freestyle chess tournament that had no restrictions on the use of technological support.

We propose a tool that puts similar power in the hands of songwriters and artists. Machines have the power to parse millions of lines of lyrics to analyze countless more themes than humanly possible. Long and complex rhymes can be found in mere milliseconds.

The idea is simple. The potential is enormous.

Our machine learning model will analyze the existing lyrics of hundreds of thousands of songs. It will track how often it encounters sequences of words. For example: how often does "your" follow the word "throw"; how often does "hands" follow the two words "throw your"; how often does "up" follow the three words "throw your hands"; etc…

Once our model is trained on this frequency data, it can suggest completions for lyrics based on user input. Algorithms can filter the suggestions to match certain rhyme restrictions, syllable count restrictions, and more. The model can be trained both forwards and backwards so that it can predict that both "hands" follows the words "throw your" and also that "throw" preceds the suffix of words "your hands". In this manner, a user ask for a lyric completing a particular phrase and also ask for a lyric that ends with a particular rhyme.

These capabilities translate directly to profit. Songwriters will more quickly find the perfect rhyme, reducing costs. They'll also discover lyrics they otherwise wouldn't have thought of and listeners will appreciate the powerful result, increasing sales.

The advantages of the data model we've chosen (known as a Hidden Markov Model), is that it's fast to train. This will allow re-training on themed data sets, like pop, rock, or heavy metal. It also doesn't require expensive hardware which will save on both up-front and maintenance costs.

We can provide this tool at a cost of $xxx,xxx with your ongoing expense of $xxx per month to run the web server to interface with the tool.

We look forward to building your value,

Eric Ihli Owoga Industries eihli@owoga.com

Problem Summary

Songwriting is difficult. There are over 170,000 words in the English language and the average native speaker's vocabulary is typically less than 50,000 words. https://capitalizemytitle.com/how-large-is-the-average-persons-vocabulary/

Existing tools are limited in their functionality. Thesauruses typically only provide one-word suggestions as similarities and rhyming dictionaries and thesauruses only provided limited types of rhymes. A thorough rhyming dictionary would be prohibitively large and difficult to print and carry around as it would have to cover the many different types of rhymes: assonance, consonance, dactyl meter, internal, feminine, and masculine, just to name a few.

Product Benefits

The benefit of a computer tool is that it can be trained on extremely large and contemporary data sets. If the songwriter only wants recently-used pop-lyric related suggestions, that's just a few clicks away. This data model is also unhindered by slang. Machine learning models are great at estimating the pronunciation of words that don't appear in dictionaries, and users can add their own words and pronunciations to improve accuracy.

Product Outline

The product will have a web-based graphical user interface. Songwriters can input parameters such as target rhyming words and number of desired syllables. The web interface will interact with the pre-trained data model to suggest words based on observed frequency from the training data set.

Data Description

An initial data model will be provided. It will be generated in the heavy-metal theme. The training lyrics will be downloaded from the publicly accessible website at http://darklyrics.com.

Objectives

The objective of this project is to provide a tool that songwriters can use to boost their creativity. The tool should be able to suggest clever rhymes that are close to grammatically correct (allowing for some artistic expression).

Methodology

I will use a Hidden Markov Model. This will provide a representation of the training data set. From the model, I can calculate probabilities of a future state based on a current state.

The data collection process will involve web scraping and data cleaning. I'll remove non-English lyrics and unpronounceable words.

Funding

We can begin work with no funding. Payment can be made on delivery of the product.

Stakeholder Impact

For stakeholders at record labels, you advertise this tool to songwriters as a way to draw talent and reduce expense. Independent songwriter stakeholders can use this tool to enhance their resume and sell more work.

Ethical and Legal Considerations

This tool will not use any sensitive data. The web server will be served over https, just in case someone accidentally types something sensitive in the lyric generation input field.

Developer Expertise

My personal experience of writing songs/poetry combined with over 8 years of professional software development work is just part of what makes me perfectly suited to build this product. I also have experience writing libraries for memory-efficient data structures, a vital component of large Markov models.

Executive Summary

Opportunity

We will be building a tool based on a Hidden Markov Model of song lyrics to suggest new rhymes and lyrics to songwriters.

Target Customer

Record Labels

Record labels can provide this tool to their existing songwriters to help them write better lyrics faster. This will save them money by reducing the hours spent on songwriting and it will increase profits by providing more and better songs to their listeners.

Songwriters

Independent songwriters can use this tool to optimize their time and compete with large record labels.

Students

Students can use this tool to help them brainstorm for poetry/literature classes.

Existing Solutions

There is an existing tool, RhymeGenie (https://www.rhymegenie.com/), that sells for $24.95. It suffers the usability restriction of only being available for MacOS, Windows, and iOS. Our web-based tool will provide support for all of those platforms plus Android. We will also be able to provide users with instantaneous updates ape improvements without asking the user to install anything.

Data Collection

Data collection will initially be performed by scraping http://darklyrics.com. Future data collection methods may involve importing the lyrics database available to record labels, scraping other lyrics websites, or using books provided by organizations like Project Gutenberg.

Methodology

• the methodology you use to guide and support the data product design and development

Deliverables

• deliverables associated with the design and development of the data product

Implementation and Anticipated Outcomes

• the plan for implementation of your data product, including the anticipated outcomes from this development

Validation and Verification

The ultimate validation must come from the use of the application. Do people use it? Academic numbers about how well the algorithm performs by some mathematical benchmark doesn't matter if nobody uses the product.

To validate the product, I'll track visits to the tool and have a feedback link so that users can tell me directly what they like and dislike.

Costs

The model can be designed and trained on consumer hardware, so there is no cost there. The model can run on a server that can be obtained from the cloud for under $100/mo.

Development cost is $100/hour with an estimate of 120 hours for $12,000 total initial development cost.

Timeline

——–———————————————————————————-+

Sprint Start Date End Date Tasks

——–———————————————————————————-+

0 2021/08/01 2021/08/04 - Collect data
- Clean data

——–———————————————————————————-+

1 2021/08/04 2021/08/07 - Write training API
- Train model

——–———————————————————————————-+

2 2021/08/07 2021/08/11 - Evaluate and improve model

——–———————————————————————————-+

3 2021/11/11 2021/11/14 - Add rhyme constraints
- Add syllabification constraints

——–———————————————————————————-+

4 2021/11/14 2021/08/18 - Build web interface

——–———————————————————————————-+

5 2021/08/18 2021/08/21 - QA
- Test
- Fix bugs

——–———————————————————————————-+

Documentation

  1. Create each of the following forms of documentation for the product you have developed:

Business Vision

• a business vision or business requirements document

Data Sets

• raw and cleaned data sets with the code and executable files used to scrape and clean data (if applicable)

Data Analysis

• code used to perform the analysis of the data and construct a descriptive, predictive, or prescriptive data product

Assessment

• assessment of the hypotheses for acceptance or rejection

Visualizations

• visualizations and elements of effective storytelling supporting the data exploration and preparation, data analysis, and data summary, including the phenomenon and its detection

Accuracy

• assessment of the products accuracy

Testing

• the results from the data product testing, revisions, and optimization based on the provided plans, including screenshots

Source

• source code and executable file(s)

Quick Start

• a quick start guide summarizing the steps necessary to install and use the product