I want to gauge the similarity between the songs for data cleaning and also maybe use this as a way to check if the music generated is more similar to the truth (the switching vocals version) than the baseline input (the original song).
Using the extractor, I plotted out the Taggram and got the tag likelihoods for a song (Justin Bieber – Love Yourself) and the switching vocals version of that song, to try out their model.
Comparison within a Song
Taggram Comparison
Some differences would be that
there is no “opera” tag
the “women” tag was detected
The likelihood of tag detection is more concentrated at certain times.
Tags Likelihood Comparison
This is like the taggram averaged over time.
Differences:
Decrease in “male” and “male vocals” tags likelihood
“Opera” and “Quiet” tag likelihoods are eliminated.
“female vocals”, “female” and “pop” are increased.
Comparison between songs
I’ll compare another original song against the switching vocals of a different song.
Taggram Comparison
This is pretty different from both taggrams of Justin Bieber’s Love Yourself
Tags Likelihood Comparison
Also pretty different, e.g. Tags likelihood for “techno”, “drums and “electronic” are higher for Maroon 5’s Maps than Justin Bieber’s Love Yourself.
Songs Mashup Comparison
Mashups are songs that are a mix of 2 or more songs. I want to see if there is a significant similarity in tag likelihood between songs contributing to the mashup and the mashup song.
Input Songs
Tag of Input Songs (Whatever It Takes, Believer and Thunder by Imagine Dragons)
Output Song
The mashup seems rather different from the songs making it up.
Actual Data Science
I’ll plot vectors of the tags and T-SNE it. Color of the points will correspond to songs grouped together.
So far I’ve downloaded around 500 switching vocal vids and the original songs that make them up. However I haven’t checked that the original songs downloaded are the correct songs, so I will be using a music similarity measure to verify this.
This project is related to my youtube mashups project but should be easier to train as it essentially is a pitch shift of only one song.
Essentially I want to train a NN to generate songs like:
From the base song.
Later maybe it can generate videos?
Getting Data
Format for Audio
I’m downloading the audio in wav format and keeping the video using the following code
WAV format can cover the full frequency that the human ear is able to hear! An MP3 file is compressed and has quality loss whereas a WAV file is lossless and uncompressed.
Artisound.io
Download Script
I am checking to ensure I reject the megamix (the mashups with 20+ songs) and only picking switching vocals with some regex that is supported by youtube_dl
from __future__ import unicode_literals
import youtube_dl
import os
from pathlib import Path
rootdir = str(Path().absolute())
def QueryYoutube(QueryList, toSkip = True):
""" Get the list of results from queries and put it in a json file"""
ydl_opts = {
# "outtmpl": "%(title)s.%(ext)s", #file name is song name
"outtmpl": os.path.join(rootdir,"%(title)s/SV.%(ext)s"), #folder name is song name, file is SV
"ignoreerrors": True, #Do not stop on download errors.
"nooverwrites": True, #Prevent overwriting files.
"matchtitle": "switching vocals", #not sure if this works (Download only matching titles)
"writedescription": True, #Write the video description to a .description file
"skip_download": toSkip, #don't actually download the video
"min_views": 100, #only get videos with min 10k views
"download_archive": "alreadyListedFiles.txt", #File name of a file where all downloads are recorded. Videos already present in the file are not downloaded again.
"default_search": "auto", #Prepend this string if an input url is not valid. 'auto' for elaborate guessing'
'format': 'bestaudio/best',
'postprocessors': [{
'key': 'FFmpegExtractAudio',
'preferredcodec': 'wav',
'preferredquality': '192'
}],
'postprocessor_args': [
'-ar', '16000'
],
'prefer_ffmpeg': True,
'keepvideo': True
}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(QueryList)
def test():
"""Test by downloading two sets of two SV"""
# queriesL = ["nightcore mashups", "bts mashups", "ytuser:https://www.youtube.com/channel/UC5XWNylwy4efFufjMYqcglw"]
# queriesL = ["ytuser:https://www.youtube.com/channel/UC5XWNylwy4efFufjMYqcglw", "ytuser:"]
#nightcore switching vocals
queriesL = ["https://www.youtube.com/channel/UCPtWGnX3cr6fLLB1AAohynw",
"https://www.youtube.com/channel/UCPMhsGX1A6aPmpFPRWJUkag"
]
# QueryYoutube(queriesL, True) #should download that channel
QueryYoutube(queriesL, False) #should download that channel
def run():
##### DOWNLOADING
#nightcore switching vocals channels
queriesL = ["https://www.youtube.com/channel/UCPtWGnX3cr6fLLB1AAohynw",
"https://www.youtube.com/channel/UCPMhsGX1A6aPmpFPRWJUkag",
"https://www.youtube.com/channel/UCl2fdq_CzdrDhauV85aXQDQ",
"https://www.youtube.com/channel/UC8Y2KrSAhAl1-1hqBGLBdzA",
"https://www.youtube.com/channel/UCJsX7vcaCUdPOcooysql1Uw",
"https://www.youtube.com/channel/UCtY3IhWM6UOlMBoUG-cNQyQ",
"https://www.youtube.com/channel/UCNOymlVIxfFW0mVmZiNq6DA"
]
QueryYoutube(queriesL, False)
if __name__ == "__main__":
run()
Data Cleaning & Automating Download of Original Songs
I’m taking the title of the youtube switching vocal video and using regx to find the names of the original song.
regex expression crafting
After converting the unicode to regular punctuation, I used a regex expression tester to zero in on the key words. I still have to remove some strings which got included in accidentally because I wanted to make sure I kept the artist names in the string groups matched.
Get the proper song names
I queried youtube according to the youtube_dl documentation (as the APIs for songs might not have the song I’m looking for, and I’m searching on youtube for the downloads).
To clean up the data and prevent duplicates of the original songs, I’m removing artist names and tags related to a song already noted down.
To look through the json for the metadata related to the youtube video, I used an online JSON Viewer:
Looking through Json Data
I made some test cases to check whether it works. I’m not using assert here because I don’t want the program to stop whenever a unit test fails.
Solutions (Tackled in script C_FixMissedDownloads.py:)
Check number of original videos downloaded (should correspond to number of original songs)
Filter out incorrect number of original songs (e.g. Remove Folders not containing an “Original_*” video)
Filter out if videos are too long (possibly not a song)
#TODO Check similarity of song videos against switching vocals (should have some similar parts)
TODO:
Sort Folders based on whether there are multiple songs contributing to the final video
Make model to learn audio switchingVocals transformation for the original song
Parts left:
Exploration / Transformation : Figure out how I want to represent the songs as input into the neural network, the score for the neural network’s output should represent the similarity against the original video, learn-to-hash?
Training : I currently want to test out using self-learning (GAN style). So I’ll train a discriminator using previous generation samples of the NN and the actual video to label with score whether it is an actual good mashup and let train it like a generative adversarial network
Testing : Once the GAN is pretty good, I’ll test against mashups it has never heard before.
Try out video NNs
Implementation + new avenue to explore : I’ll post some mashups to youtube~ and see the number of likes and dislikes a video gets per view -> train the network to produce mashups that are more liked per view?
Things to improve efficiency:
Memory storage
prevent duplicate video files by storing all video files in common folder and just using the file path as a reference to the video.
I’m going to start by writing a script to make a list of mashups and find the song names from the title.
Once I’ve verified it works, I’ll leave it to download the videos in a folder structure that looks like:
Mashup Folder
Songs Folder
Mashup Folder
json of links
I think I can find mashups of the same few songs so there will be more possible mashups for a specific song selection.
Steps
Update dependencies
Make the Script
Find list of mashups,
Find song titles from mashup titles
Download Mashup and related songs in a correct folder structure
Download more mashups of related songs
Update Dependencies
Make sure youtube_dl is installed / up to date.
brew install youtube-dl
Writing the Code
So the plan is to make modular functions I can test out cause you should always TEST YOUR CODE 🙂
Functions I need
QueryYoutube: Get the list of results from queries and put it in a text file
Input = count of songs for each query & queries to use
GetSongNamesFromMashup: Takes a text file and goes through each line (which is a mashup name) and generates a json file where the dictionary has mashup name: songName1, songName2…)
Possible issue where a song has a ton of songNames like those 50 songs mashups
will probably use a regex
DownloadAll: Takes a json file and downloads all the songs & videos in the folder structure as mentioned above. Put the links in the folder structure in a json file as well.
This is a small project to test some self learning concepts I’ve read about and for fun 😀
What I’m going to do
I’m going to train a neural network to take in 2 songs and generate an audio mashup from that.
I’m going to compare that to training a neural network to take in 2 videos and generating a video mashup of that. The problem with this is that the data would be a bit of a mess because mashup videos on youtube seem to take footage from other sources instead of the song video.
If I do transfer learning of the audio mashup NN for the video mashup NN, that should be more effective right? But the audio and video should be correlated…
Steps
Data Collection & Storage : I’ll use youtube_dl to make a script to download mashups and then from the title of the mashup, get the name of the 2 songs and download them too. -> will use a folder structure (mashupName > mashup folder + songs folder)
Data Cleaning : Going over the data to make sure the mashups are actually mashups of the songs I’ve collected
Exploration / Transformation : Figure out how I want to represent the songs as input into the neural network, the score for the neural network’s output should represent the similarity against the original video, learn-to-hash?
Training : I currently want to test out using self-learning (GAN style). So I’ll train a discriminator using previous generation samples of the NN and the actual video to label with score whether it is an actual good mashup and let train it like a generative adversarial network
Testing : Once the GAN is pretty good, I’ll test against mashups it has never heard before.
Try out video NNs
Implementation + new avenue to explore : I’ll post some mashups to youtube~ and see the number of likes and dislikes a video gets per view -> train the network to produce mashups that are more liked per view?
I’m exploring OpenDrift which is an open-source framework for ocean trajectory modelling.
Goal: What I want to do
Train a neural network to predict the trajectory of the entire swarm by giving it the inputs of the readers and verifying against the bounds of the swarm.
I will be modifying an example to learn more about OpenDrift before I make my own model / classes.
Starting: First Attempt
I’m working off example_drifter.py and modifying it to work with different parameters so I don’t train the model to only predict one result (over-fitting).
To get a better understanding of the models and readers used by example_drifter.py I changed the script as follows:
Changed the loglevel so I only get important info
Check variables required by the OceanDrift Model and default (fallback) values
o = OceanDrift(loglevel=20) # 0 is debug mode which shows a lot of info, 20 is important info, 50 is no info
print(OceanDrift.required_variables)
print(OceanDrift.fallback_values)
Inspect the readers
print(reader_current)
print(reader_wind)
Here is the output:
Reader Data
===========================
Reader: /Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/norkyst800_subset_16Nov2015.nc
Projection:Â
 +proj=stere +lat_0=90 +lon_0=70 +lat_ts=60 +units=m +a=6.371e+06 +e=0 +no_defs
Coverage: [m]
 xmin: -2952800.000000  xmax: -2712800.000000  step: 800  numx: 301
 ymin: -1384000.000000  ymax: -1224000.000000  step: 800  numy: 201
 Corners (lon, lat):
  ( 2.52, 59.90) ( 4.28, 61.89)
  ( 5.11, 59.32) ( 7.03, 61.26)
Vertical levels [m]:Â
 [-0.0]
Available time range:
 start: 2015-11-16 00:00:00  end: 2015-11-18 18:00:00  step: 1:00:00
  67 times (0 missing)
Variables:
 time
 x_sea_water_velocity
 y_sea_water_velocity
Reader Data
===========================
Reader: /Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/norkyst800_subset_16Nov2015.nc
Projection:Â
 +proj=stere +lat_0=90 +lon_0=70 +lat_ts=60 +units=m +a=6.371e+06 +e=0 +no_defs
Coverage: [m]
 xmin: -2952800.000000  xmax: -2712800.000000  step: 800  numx: 301
 ymin: -1384000.000000  ymax: -1224000.000000  step: 800  numy: 201
 Corners (lon, lat):
  ( 2.52, 59.90) ( 4.28, 61.89)
  ( 5.11, 59.32) ( 7.03, 61.26)
Vertical levels [m]:Â
 [-0.0]
Available time range:
 start: 2015-11-16 00:00:00  end: 2015-11-18 18:00:00  step: 1:00:00
  67 times (0 missing)
Variables:
 time
 x_sea_water_velocity
 y_sea_water_velocity
Reader performance:
--------------------
/Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/norkyst800_subset_16Nov2015.nc
 0:00:09.5 total
 0:00:00.1 preparing
 0:00:00.4 reading
 0:00:00.2 interpolation
 0:00:00.1 interpolation_time
 0:00:08.6 rotating vectors
 0:00:00.0 masking
--------------------
/Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/arome_subset_16Nov2015.nc
 0:00:09.3 total
 0:00:00.1 preparing
 0:00:00.3 reading
 0:00:00.1 interpolation
 0:00:00.1 interpolation_time
 0:00:08.6 rotating vectors
 0:00:00.0 masking
--------------------
global_landmask
 0:00:00.9 total
 0:00:00.0 preparing
 0:00:00.8 reading
 0:00:00.0 interpolation_time
 0:00:00.0 masking
--------------------
Performance:
  25.8 total time
  0.1 configuration
  3.2 preparing main loop
   3.1 making dynamical landmask
   0.0 moving elements to ocean
   20.7 readers
    0.9 global_landmask
    0.3 postprocessing
  21.9 main loop
    9.6 /Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/norkyst800_subset_16Nov2015.nc
    9.4 /Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/arome_subset_16Nov2015.nc
   0.8 updating elements
  0.4 cleaning up
--------------------
Display properties of seeded elements
print(o.elements_scheduled)
Check which properties of the model can be configured
print(o.list_configspec())
I also printed the model instance
print(o)
Model: OceanDrift   (OpenDrift version 1.1.0rc2)
43 active PassiveTracer particles (1957 deactivated, 0 scheduled)
Projection: +proj=stere +lat_0=90 +lon_0=70 +lat_ts=60 +units=m +a=6.371e+06 +e=0 +no_defs
-------------------
Environment variables:
 -----
 x_sea_water_velocity
 y_sea_water_velocity
   1) /Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/norkyst800_subset_16Nov2015.nc
 -----
 x_wind
 y_wind
   1) /Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site- packages/tests/test_data/16Nov2015_NorKyst_z_surface/arome_subset_16Nov2015.nc
 -----
 land_binary_mask
   1) global_landmask
Time:
Start: 2015-11-16 00:00:00
Present: 2015-11-18 18:00:00
Calculation steps: 264 * 0:15:00 - total time: 2 days, 18:00:00
Output steps: 67 * 1:00:00
===========================
There are existing github libraries that already implement the 3D Human Pose Estimation I need: I’m mainly looking at DensePose Github, V-Nect Github and VideoPose3D.
Can it be a person’s face and figure from a video then a 3d model of them doing it in VR (like watching kpop dance videos)
Which is a pretty cute idea. To elaborate more, I think what I want is to train a model to take in a video as input and output an animated 3D modelthat can be displayed in VR.
The model should move the way the person is moving in the video which would involve 3D Pose Estimation.
Project Specifications
The basic skeleton:
Input: Video of a person moving
Output: 3D Animated Model moving the same way
Additional Features (good to have in the future):
Implement a speech feature that sounds like that person
Change the movement of the characters (so you rig the character differently) but with the same model (it looks like the person in the video)
Compare character models for 2 videos and highlight parts which differ (useful for learning proper form)
What this could be used for
Honestly, I’m doing this project because I think it will be cool, fun and I’ll learn a lot about 3D Human Pose Estimation (and related topics) but I’m sure there are some usages for this:
Learn how to do some actions properly
Weight-lifting forms (which areas to pay attention to)
K-Pop Dance Moves
Have fun watching people in 3D
Concerts (It’ll be like you were actually there)
Research
For each topic covered, I will have some mini-projects within the blog post related to it to familiarise myself with the topic/library.
Main Topics
3D Human Pose Estimation
Character Animation in VR
Potential Areas to look into?:
Generating Video based on Character Rigging (for rig the character differently so it looks like the person in the video is moving differently) – I know this has been done with speech & videos of the face.
Apply a face onto the model – Should be similar to DeepFakes
<TODO> About 3D Human Pose Estimation
<TODO> Will link summaries of relevant papers soon
I’ll be going to a hackathon later this month and decided to dedicate some time to figure out what kind of hacks are most likely to win.
About Hackathons in General
Generally you should present something that has some practical purpose at the end of the hackathon, or a prototype of such a thing.
It would be a bonus if the project is using cutting-edge technology / does something that seems really hard to do.
You cannot code before the hackathon.
About JunctionX
I’ll be attending JunctionX with my best friend 😀 to learn some new stuff, have some fun and hopefully win a prize!
Credit: JunctionX Singapore
Junction X defines a hackathon as:
A hackathon is an event in which small teams (2-5 person) of developers, designers, entrepreneurs and other specialists collaborate intensively on software projects aimed at solving particular problems. Within 48 hours, teams should come up with a working prototype and a presentation.
First you will be judged by other participants, then if you are the top 3, you will be judged by the partners.
Partners will be setting the criteria… so probably something useful from a corporate sense?
Presentation will be 3 minute demo and 2 minute Q&A – so present fast, clearly with a main purpose of your product, probably how it fits the criteria then make sure you know your creation well 😀
As mentioned before, no code written before event. You can however use open source libraries so I can look up relevant libraries for my ideas and figure out how to do stuff e.g. how to code for transfer learning if its a language/library I’m unfamiliar with.
Past Junction Hackathons
We can look at the general type of project:
Cutting Edge Tech / Relative New Tech
Social Good / Corporate Useful Thing
Cute Useful Thing
We can also look at the judges comments:
Glados’ hack Signvision was praised by the main judges for it’s awesome technical implementation (said by Jari Jaanto who read through all the finalists’ code) and impressive user experience. The hack was said have real potential and that it was practically ready to be deployed.
Junction 2017 Winner: A Mix of Machine Learning and Cool UX
Usually the judges will tell us their criteria on the day of the hackathon along with the themes so I should just come up with a few ideas that can be implemented. I’m pretty much going to work along my current interests because even if I don’t win, I do want to have fun.
So one idea could be to
use computer vision to get input from some videos (filmed on the day itself) and use that to generate a model (using transfer learning from existing state of the art models) and make something cute in VR.
Misc
This is what the schedule looks like, I’ll probably head home to sleep since it is a pretty long hackathon and past a certain level of sleep deprivation, I can’t code effectively.