Data Exploration with MusicCNN (Switching Vocals Mini-Project)

Purpose

I want to gauge the similarity between the songs for data cleaning and also maybe use this as a way to check if the music generated is more similar to the truth (the switching vocals version) than the baseline input (the original song).

What is MusicCNN?

It is a github repository based off the paper:

Exploring Data

Using the extractor, I plotted out the Taggram and got the tag likelihoods for a song (Justin Bieber – Love Yourself) and the switching vocals version of that song, to try out their model.

Comparison within a Song

Taggram Comparison

Taggram for: Justin Bieber – Love Yourself (Original Song)
Taggram for: Switching Vocals Version of Justin Bieber – Love Yourself

Some differences would be that

  • there is no “opera” tag
  • the “women” tag was detected
  • The likelihood of tag detection is more concentrated at certain times.

Tags Likelihood Comparison

This is like the taggram averaged over time.

Tags Likelihood for: Justin Bieber – Love Yourself (Original Song)
Tags Likelihood for: Switching Vocals Version of Justin Bieber – Love Yourself

Differences:

  • Decrease in “male” and “male vocals” tags likelihood
  • “Opera” and “Quiet” tag likelihoods are eliminated.
  • “female vocals”, “female” and “pop” are increased.

Comparison between songs

I’ll compare another original song against the switching vocals of a different song.

Taggram Comparison

Taggram for the Original Version of Maps by Maroon 5

This is pretty different from both taggrams of Justin Bieber’s Love Yourself

Tags Likelihood Comparison

Tags Likelihood of Original Maps by Maroon 5

Also pretty different, e.g. Tags likelihood for “techno”, “drums and “electronic” are higher for Maroon 5’s Maps than Justin Bieber’s Love Yourself.

Songs Mashup Comparison

Mashups are songs that are a mix of 2 or more songs. I want to see if there is a significant similarity in tag likelihood between songs contributing to the mashup and the mashup song.

Input Songs

Output Song

The mashup seems rather different from the songs making it up.

Actual Data Science

I’ll plot vectors of the tags and T-SNE it. Color of the points will correspond to songs grouped together.

#TODO

Switching Vocals (Part 2) – More Data Cleaning & Verification

So far I’ve downloaded around 500 switching vocal vids and the original songs that make them up. However I haven’t checked that the original songs downloaded are the correct songs, so I will be using a music similarity measure to verify this.

Music Similarity Measures

#Writing in Progress

Libraries I am Exploring:

Mini-Project – Switching Vocals

This project is related to my youtube mashups project but should be easier to train as it essentially is a pitch shift of only one song.

Essentially I want to train a NN to generate songs like:

From the base song.

Later maybe it can generate videos?

Getting Data

Format for Audio

I’m downloading the audio in wav format and keeping the video using the following code

WAV format can cover the full frequency that the human ear is able to hear! An MP3 file is compressed and has quality loss whereas a WAV file is lossless and uncompressed. 

Artisound.io

Download Script

I am checking to ensure I reject the megamix (the mashups with 20+ songs) and only picking switching vocals with some regex that is supported by youtube_dl

from __future__ import unicode_literals
import youtube_dl
import os
from pathlib import Path

rootdir = str(Path().absolute())

def QueryYoutube(QueryList, toSkip = True):
	""" Get the list of results from queries and put it in a json file"""
	ydl_opts = {
		# "outtmpl": "%(title)s.%(ext)s", #file name is song name
		"outtmpl": os.path.join(rootdir,"%(title)s/SV.%(ext)s"), #folder name is song name, file is SV
		"ignoreerrors": True, #Do not stop on download errors.
		"nooverwrites": True, #Prevent overwriting files.
		"matchtitle": "switching vocals", #not sure if this works (Download only matching titles)
		"writedescription": True, #Write the video description to a .description file
		"skip_download": toSkip, #don't actually download the video
		"min_views": 100, #only get videos with min 10k views
		"download_archive": "alreadyListedFiles.txt", #File name of a file where all downloads are recorded. Videos already present in the file are not downloaded     again.
		"default_search": "auto", #Prepend this string if an input url is not valid. 'auto' for elaborate guessing'
		'format': 'bestaudio/best',
	    'postprocessors': [{
	        'key': 'FFmpegExtractAudio',
	        'preferredcodec': 'wav',
	        'preferredquality': '192'
	    }],
	    'postprocessor_args': [
	        '-ar', '16000'
	    ],
	    'prefer_ffmpeg': True,
	    'keepvideo': True
		}
	with youtube_dl.YoutubeDL(ydl_opts) as ydl:
	    ydl.download(QueryList)


def test():
	"""Test by downloading two sets of two SV"""
	# queriesL = ["nightcore mashups", "bts mashups", "ytuser:https://www.youtube.com/channel/UC5XWNylwy4efFufjMYqcglw"]
	# queriesL = ["ytuser:https://www.youtube.com/channel/UC5XWNylwy4efFufjMYqcglw", "ytuser:"]

	#nightcore switching vocals
	queriesL = ["https://www.youtube.com/channel/UCPtWGnX3cr6fLLB1AAohynw", 
				"https://www.youtube.com/channel/UCPMhsGX1A6aPmpFPRWJUkag"
				]
	# QueryYoutube(queriesL, True) #should download that channel
	QueryYoutube(queriesL, False) #should download that channel

def run():
	##### DOWNLOADING
	#nightcore switching vocals channels
	queriesL = ["https://www.youtube.com/channel/UCPtWGnX3cr6fLLB1AAohynw", 
				"https://www.youtube.com/channel/UCPMhsGX1A6aPmpFPRWJUkag", 
				"https://www.youtube.com/channel/UCl2fdq_CzdrDhauV85aXQDQ",
				"https://www.youtube.com/channel/UC8Y2KrSAhAl1-1hqBGLBdzA",
				"https://www.youtube.com/channel/UCJsX7vcaCUdPOcooysql1Uw",
				"https://www.youtube.com/channel/UCtY3IhWM6UOlMBoUG-cNQyQ",
				"https://www.youtube.com/channel/UCNOymlVIxfFW0mVmZiNq6DA"
				]
	QueryYoutube(queriesL, False)

if __name__ == "__main__":
    run()
    

Data Cleaning & Automating Download of Original Songs

I’m taking the title of the youtube switching vocal video and using regx to find the names of the original song.

regex expression crafting

After converting the unicode to regular punctuation, I used a regex expression tester to zero in on the key words. I still have to remove some strings which got included in accidentally because I wanted to make sure I kept the artist names in the string groups matched.

Get the proper song names

I queried youtube according to the youtube_dl documentation (as the APIs for songs might not have the song I’m looking for, and I’m searching on youtube for the downloads).

To clean up the data and prevent duplicates of the original songs, I’m removing artist names and tags related to a song already noted down.

To look through the json for the metadata related to the youtube video, I used an online JSON Viewer:

Looking through Json Data

I made some test cases to check whether it works. I’m not using assert here because I don’t want the program to stop whenever a unit test fails.

Code posted in github in the Download Videos folder.

Current Folder Structure

Example of folder structure

Notes:

  • Problems
    • Some Original videos not downloaded
    • some wrong videos downloaded (different artist)
  • Solutions (Tackled in script C_FixMissedDownloads.py:)
    • Check number of original videos downloaded (should correspond to number of original songs)
    • Filter out incorrect number of original songs (e.g. Remove Folders not containing an “Original_*” video)
    • Filter out if videos are too long (possibly not a song)
    • #TODO Check similarity of song videos against switching vocals (should have some similar parts)

TODO:

  • Sort Folders based on whether there are multiple songs contributing to the final video
  • Make model to learn audio switchingVocals transformation for the original song

Parts left:

  • Exploration / Transformation : Figure out how I want to represent the songs as input into the neural network, the score for the neural network’s output should represent the similarity against the original video, learn-to-hash?
  • Training : I currently want to test out using self-learning (GAN style). So I’ll train a discriminator using previous generation samples of the NN and the actual video to label with score whether it is an actual good mashup and let train it like a generative adversarial network
  • Testing : Once the GAN is pretty good, I’ll test against mashups it has never heard before.
  • Try out video NNs
  • Implementation + new avenue to explore : I’ll post some mashups to youtube~ and see the number of likes and dislikes a video gets per view -> train the network to produce mashups that are more liked per view?

Things to improve efficiency:

  • Memory storage
    • prevent duplicate video files by storing all video files in common folder and just using the file path as a reference to the video.

Next Post

Youtube Mashup – Data Collection (Part 1)

What I’m trying to achieve

I’m going to start by writing a script to make a list of mashups and find the song names from the title.

Once I’ve verified it works, I’ll leave it to download the videos in a folder structure that looks like:

  • Mashup Folder
    • Songs Folder
    • Mashup Folder
    • json of links

I think I can find mashups of the same few songs so there will be more possible mashups for a specific song selection.

Steps

  1. Update dependencies
  2. Make the Script
    1. Find list of mashups,
    2. Find song titles from mashup titles
    3. Download Mashup and related songs in a correct folder structure
    4. Download more mashups of related songs

Update Dependencies

Make sure youtube_dl is installed / up to date.

brew install youtube-dl

Writing the Code

So the plan is to make modular functions I can test out cause you should always TEST YOUR CODE 🙂

Functions I need

  • QueryYoutube: Get the list of results from queries and put it in a text file
    • Input = count of songs for each query & queries to use
  • GetSongNamesFromMashup: Takes a text file and goes through each line (which is a mashup name) and generates a json file where the dictionary has mashup name: songName1, songName2…)
    • Possible issue where a song has a ton of songNames like those 50 songs mashups
    • will probably use a regex
  • DownloadAll: Takes a json file and downloads all the songs & videos in the folder structure as mentioned above. Put the links in the folder structure in a json file as well.
    • Test by downloading two sets of mashups

Mini-Project – Self Learning & Youtube Mashups

About

This is a small project to test some self learning concepts I’ve read about and for fun 😀

What I’m going to do

I’m going to train a neural network to take in 2 songs and generate an audio mashup from that.

I’m going to compare that to training a neural network to take in 2 videos and generating a video mashup of that. The problem with this is that the data would be a bit of a mess because mashup videos on youtube seem to take footage from other sources instead of the song video.

If I do transfer learning of the audio mashup NN for the video mashup NN, that should be more effective right? But the audio and video should be correlated…

Steps

  1. Data Collection & Storage : I’ll use youtube_dl to make a script to download mashups and then from the title of the mashup, get the name of the 2 songs and download them too. -> will use a folder structure (mashupName > mashup folder + songs folder)
  2. Data Cleaning : Going over the data to make sure the mashups are actually mashups of the songs I’ve collected
  3. Exploration / Transformation : Figure out how I want to represent the songs as input into the neural network, the score for the neural network’s output should represent the similarity against the original video, learn-to-hash?
  4. Training : I currently want to test out using self-learning (GAN style). So I’ll train a discriminator using previous generation samples of the NN and the actual video to label with score whether it is an actual good mashup and let train it like a generative adversarial network
  5. Testing : Once the GAN is pretty good, I’ll test against mashups it has never heard before.
  6. Try out video NNs
  7. Implementation + new avenue to explore : I’ll post some mashups to youtube~ and see the number of likes and dislikes a video gets per view -> train the network to produce mashups that are more liked per view?

Followup Posts

OpenDrift Project – Exploring: Example_Drifter

I’m exploring OpenDrift which is an open-source framework for ocean trajectory modelling.

Goal: What I want to do

Train a neural network to predict the trajectory of the entire swarm by giving it the inputs of the readers and verifying against the bounds of the swarm.

I will be modifying an example to learn more about OpenDrift before I make my own model / classes.

Starting: First Attempt

I’m working off example_drifter.py and modifying it to work with different parameters so I don’t train the model to only predict one result (over-fitting).

To get a better understanding of the models and readers used by example_drifter.py I changed the script as follows:

  • Changed the loglevel so I only get important info
  • Check variables required by the OceanDrift Model and default (fallback) values
o = OceanDrift(loglevel=20)  # 0 is debug mode which shows a lot of info, 20 is important info, 50 is no info

print(OceanDrift.required_variables)
print(OceanDrift.fallback_values)
  • Inspect the readers
print(reader_current)
print(reader_wind)

Here is the output:

 Reader Data
 ===========================
 Reader: /Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/norkyst800_subset_16Nov2015.nc
 Projection: 
   +proj=stere +lat_0=90 +lon_0=70 +lat_ts=60 +units=m +a=6.371e+06 +e=0 +no_defs
 Coverage: [m]
   xmin: -2952800.000000   xmax: -2712800.000000   step: 800   numx: 301
   ymin: -1384000.000000   ymax: -1224000.000000   step: 800   numy: 201
   Corners (lon, lat):
     (  2.52,  59.90)  (  4.28,  61.89)
     (  5.11,  59.32)  (  7.03,  61.26)
 Vertical levels [m]: 
   [-0.0]
 Available time range:
   start: 2015-11-16 00:00:00   end: 2015-11-18 18:00:00   step: 1:00:00
     67 times (0 missing)
 Variables:
   time
   x_sea_water_velocity
   y_sea_water_velocity 


 Reader Data
 ===========================
 Reader: /Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/norkyst800_subset_16Nov2015.nc
 Projection: 
   +proj=stere +lat_0=90 +lon_0=70 +lat_ts=60 +units=m +a=6.371e+06 +e=0 +no_defs
 Coverage: [m]
   xmin: -2952800.000000   xmax: -2712800.000000   step: 800   numx: 301
   ymin: -1384000.000000   ymax: -1224000.000000   step: 800   numy: 201
   Corners (lon, lat):
     (  2.52,  59.90)  (  4.28,  61.89)
     (  5.11,  59.32)  (  7.03,  61.26)
 Vertical levels [m]: 
   [-0.0]
 Available time range:
   start: 2015-11-16 00:00:00   end: 2015-11-18 18:00:00   step: 1:00:00
     67 times (0 missing)
 Variables:
   time
   x_sea_water_velocity
   y_sea_water_velocity 

 Reader performance:
 --------------------
 /Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/norkyst800_subset_16Nov2015.nc
  0:00:09.5  total
  0:00:00.1  preparing
  0:00:00.4  reading
  0:00:00.2  interpolation
  0:00:00.1  interpolation_time
  0:00:08.6  rotating vectors
  0:00:00.0  masking
 --------------------
 /Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/arome_subset_16Nov2015.nc
  0:00:09.3  total
  0:00:00.1  preparing
  0:00:00.3  reading
  0:00:00.1  interpolation
  0:00:00.1  interpolation_time
  0:00:08.6  rotating vectors
  0:00:00.0  masking
 --------------------
 global_landmask
  0:00:00.9  total
  0:00:00.0  preparing
  0:00:00.8  reading
  0:00:00.0  interpolation_time
  0:00:00.0  masking
 --------------------
 Performance:
    25.8 total time
     0.1 configuration
     3.2 preparing main loop
       3.1 making dynamical landmask
       0.0 moving elements to ocean
      20.7 readers
         0.9 global_landmask
         0.3 postprocessing
    21.9 main loop
         9.6 /Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/norkyst800_subset_16Nov2015.nc
         9.4 /Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/arome_subset_16Nov2015.nc
       0.8 updating elements
     0.4 cleaning up
 -------------------- 
  • Display properties of seeded elements
print(o.elements_scheduled) 
  • Check which properties of the model can be configured
print(o.list_configspec())
  • I also printed the model instance
print(o)
 Model: OceanDrift     (OpenDrift version 1.1.0rc2)
 43 active PassiveTracer particles  (1957 deactivated, 0 scheduled)
 Projection: +proj=stere +lat_0=90 +lon_0=70 +lat_ts=60 +units=m +a=6.371e+06 +e=0 +no_defs
 -------------------
 Environment variables:
   -----
   x_sea_water_velocity
   y_sea_water_velocity
      1) /Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/norkyst800_subset_16Nov2015.nc
   -----
   x_wind
   y_wind
      1) /Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site- packages/tests/test_data/16Nov2015_NorKyst_z_surface/arome_subset_16Nov2015.nc
   -----
   land_binary_mask
      1) global_landmask
 
 Time:
 Start: 2015-11-16 00:00:00
 Present: 2015-11-18 18:00:00
 Calculation steps: 264 * 0:15:00 - total time: 2 days, 18:00:00
 Output steps: 67 * 1:00:00
 =========================== 

Here is the animation:

I also saved the output:

o.run(end_time=reader_current.end_time, time_step=timedelta(minutes=15),
      time_step_output=timedelta(minutes=60), outfile='egDrifter.nc')

and wrote a program to examine and plot the output.

program to examine output file

Here are the plots:

Plot according to depth

Problem encountered:

Plot with background of vector fields (seems to be buggy)

The background vector field is meant to look something like:

From the github examples

Previous Posts: (TO DO)

  • Installing OpenDrift for Mac
  • OpenDrift: Reading up on it – My Notes

3D Human Pose Estimation

This post is linked to my research for my VR Person Project.

What is 3D Human Pose Estimation?

3D Human Pose Estimation is the task of estimating the pose of a human from a picture or set of video frames.

PapersWithCode

Its usually done through three ways:

  • Model-based generative methods
    • Pictorial Structure Model (PSM)
    • Deep Learning
  • Discriminative methods (regression)

There are 3D models generated directly from RGB images and 3D models generated with the aid of 2D Human Pose Estimation.

Read this article for more info.

Approaches

TODO: Paper summaries

Libraries

There are existing github libraries that already implement the 3D Human Pose Estimation I need: I’m mainly looking at DensePose Github, V-Nect Github and VideoPose3D.

References

VR Person Project

This project is an idea I had for the 100 Days of Code Challenge. My list of ideas essentially stated:

Make a Unity Pet/Person/Slime Blob that I can interact with in VR

  • 3D Pose from Video
  • Audio Clone of Voice
  • Can it be a person’s face and figure from a video then a 3d model of them doing it in VR (like watching kpop dance videos)

Which is a pretty cute idea. To elaborate more, I think what I want is to train a model to take in a video as input and output an animated 3D model that can be displayed in VR.

The model should move the way the person is moving in the video which would involve 3D Pose Estimation.

Project Specifications

The basic skeleton:

  • Input: Video of a person moving
  • Output: 3D Animated Model moving the same way

Additional Features (good to have in the future):

  • Implement a speech feature that sounds like that person
  • Can apply a face onto the model
  • Model for non-human characters (like a cat)
  • Change the movement of the characters (so you rig the character differently) but with the same model (it looks like the person in the video)
  • Compare character models for 2 videos and highlight parts which differ (useful for learning proper form)

What this could be used for

Honestly, I’m doing this project because I think it will be cool, fun and I’ll learn a lot about 3D Human Pose Estimation (and related topics) but I’m sure there are some usages for this:

  • Learn how to do some actions properly
    • Weight-lifting forms (which areas to pay attention to)
    • K-Pop Dance Moves
  • Have fun watching people in 3D
    • Concerts (It’ll be like you were actually there)

Research

For each topic covered, I will have some mini-projects within the blog post related to it to familiarise myself with the topic/library.

Main Topics

  • 3D Human Pose Estimation
  • Character Animation in VR

Potential Areas to look into?:

  • Generating Video based on Character Rigging (for rig the character differently so it looks like the person in the video is moving differently) – I know this has been done with speech & videos of the face.
  • Apply a face onto the model – Should be similar to DeepFakes

<TODO> About 3D Human Pose Estimation

<TODO> Will link summaries of relevant papers soon

Progress

<TODO> Will update and link a github

Preparing for a Hackathon

Intro

I’ll be going to a hackathon later this month and decided to dedicate some time to figure out what kind of hacks are most likely to win.

About Hackathons in General

Generally you should present something that has some practical purpose at the end of the hackathon, or a prototype of such a thing.

It would be a bonus if the project is using cutting-edge technology / does something that seems really hard to do.

You cannot code before the hackathon.

About JunctionX

I’ll be attending JunctionX with my best friend 😀 to learn some new stuff, have some fun and hopefully win a prize!

Credit: JunctionX Singapore

Junction X defines a hackathon as:

A hackathon is an event in which small teams (2-5 person) of developers, designers, entrepreneurs and other specialists collaborate intensively on software projects aimed at solving particular problems. Within 48 hours, teams should come up with a working prototype and a presentation.

https://singapore.hackjunction.com/

The questions will be released on the 26th of September (1 day before the hackathon), but what I do know is that they have 3 tracks:

https://singapore.hackjunction.com/tracks

They are partnering with Rakuten for the API (since the prizes are from Rakuten and they just had a webinar on how to use Rakuten.

https://singapore.hackjunction.com/submission

Main points are that:

  • First you will be judged by other participants, then if you are the top 3, you will be judged by the partners.
  • Partners will be setting the criteria… so probably something useful from a corporate sense?
  • Presentation will be 3 minute demo and 2 minute Q&A – so present fast, clearly with a main purpose of your product, probably how it fits the criteria then make sure you know your creation well 😀
https://singapore.hackjunction.com/rules

As mentioned before, no code written before event. You can however use open source libraries so I can look up relevant libraries for my ideas and figure out how to do stuff e.g. how to code for transfer learning if its a language/library I’m unfamiliar with.

Past Junction Hackathons

We can look at the general type of project:

  • Cutting Edge Tech / Relative New Tech
  • Social Good / Corporate Useful Thing
  • Cute Useful Thing

We can also look at the judges comments:

Glados’ hack Signvision was praised by the main judges for it’s awesome technical implementation (said by Jari Jaanto who read through all the finalists’ code) and impressive user experience. The hack was said have real potential and that it was practically ready to be deployed.

Junction 2017 Winner: A Mix of Machine Learning and Cool UX

In this case, the judges saw:

  • Code
  • User Experience
  • Potential value
  • Polish

Their actual criteria was:

https://junction2017.devpost.com/

What to Do

Usually the judges will tell us their criteria on the day of the hackathon along with the themes so I should just come up with a few ideas that can be implemented. I’m pretty much going to work along my current interests because even if I don’t win, I do want to have fun.

So one idea could be to

  • use computer vision to get input from some videos (filmed on the day itself) and use that to generate a model (using transfer learning from existing state of the art models) and make something cute in VR.

Misc

This is what the schedule looks like, I’ll probably head home to sleep since it is a pretty long hackathon and past a certain level of sleep deprivation, I can’t code effectively.

References

Curv

Curv is a programming language to make art using mathematics.

Installation is pretty easy, you just have to follow the instructions which I took a screenshot of below:

Installation Instructions

Next, test whether it works:

Learning Curv

I played around with the example code a bit to learn how the functions work.

You can set sliders to control parameters and rotate the shape in the 3D window.