I want to gauge the similarity between the songs for data cleaning and also maybe use this as a way to check if the music generated is more similar to the truth (the switching vocals version) than the baseline input (the original song).
Using the extractor, I plotted out the Taggram and got the tag likelihoods for a song (Justin Bieber – Love Yourself) and the switching vocals version of that song, to try out their model.
Comparison within a Song
Taggram Comparison
Some differences would be that
there is no “opera” tag
the “women” tag was detected
The likelihood of tag detection is more concentrated at certain times.
Tags Likelihood Comparison
This is like the taggram averaged over time.
Differences:
Decrease in “male” and “male vocals” tags likelihood
“Opera” and “Quiet” tag likelihoods are eliminated.
“female vocals”, “female” and “pop” are increased.
Comparison between songs
I’ll compare another original song against the switching vocals of a different song.
Taggram Comparison
This is pretty different from both taggrams of Justin Bieber’s Love Yourself
Tags Likelihood Comparison
Also pretty different, e.g. Tags likelihood for “techno”, “drums and “electronic” are higher for Maroon 5’s Maps than Justin Bieber’s Love Yourself.
Songs Mashup Comparison
Mashups are songs that are a mix of 2 or more songs. I want to see if there is a significant similarity in tag likelihood between songs contributing to the mashup and the mashup song.
Input Songs
Tag of Input Songs (Whatever It Takes, Believer and Thunder by Imagine Dragons)
Output Song
The mashup seems rather different from the songs making it up.
Actual Data Science
I’ll plot vectors of the tags and T-SNE it. Color of the points will correspond to songs grouped together.
So far I’ve downloaded around 500 switching vocal vids and the original songs that make them up. However I haven’t checked that the original songs downloaded are the correct songs, so I will be using a music similarity measure to verify this.
This project is related to my youtube mashups project but should be easier to train as it essentially is a pitch shift of only one song.
Essentially I want to train a NN to generate songs like:
From the base song.
Later maybe it can generate videos?
Getting Data
Format for Audio
I’m downloading the audio in wav format and keeping the video using the following code
WAV format can cover the full frequency that the human ear is able to hear! An MP3 file is compressed and has quality loss whereas a WAV file is lossless and uncompressed.
Artisound.io
Download Script
I am checking to ensure I reject the megamix (the mashups with 20+ songs) and only picking switching vocals with some regex that is supported by youtube_dl
from __future__ import unicode_literals
import youtube_dl
import os
from pathlib import Path
rootdir = str(Path().absolute())
def QueryYoutube(QueryList, toSkip = True):
""" Get the list of results from queries and put it in a json file"""
ydl_opts = {
# "outtmpl": "%(title)s.%(ext)s", #file name is song name
"outtmpl": os.path.join(rootdir,"%(title)s/SV.%(ext)s"), #folder name is song name, file is SV
"ignoreerrors": True, #Do not stop on download errors.
"nooverwrites": True, #Prevent overwriting files.
"matchtitle": "switching vocals", #not sure if this works (Download only matching titles)
"writedescription": True, #Write the video description to a .description file
"skip_download": toSkip, #don't actually download the video
"min_views": 100, #only get videos with min 10k views
"download_archive": "alreadyListedFiles.txt", #File name of a file where all downloads are recorded. Videos already present in the file are not downloaded again.
"default_search": "auto", #Prepend this string if an input url is not valid. 'auto' for elaborate guessing'
'format': 'bestaudio/best',
'postprocessors': [{
'key': 'FFmpegExtractAudio',
'preferredcodec': 'wav',
'preferredquality': '192'
}],
'postprocessor_args': [
'-ar', '16000'
],
'prefer_ffmpeg': True,
'keepvideo': True
}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(QueryList)
def test():
"""Test by downloading two sets of two SV"""
# queriesL = ["nightcore mashups", "bts mashups", "ytuser:https://www.youtube.com/channel/UC5XWNylwy4efFufjMYqcglw"]
# queriesL = ["ytuser:https://www.youtube.com/channel/UC5XWNylwy4efFufjMYqcglw", "ytuser:"]
#nightcore switching vocals
queriesL = ["https://www.youtube.com/channel/UCPtWGnX3cr6fLLB1AAohynw",
"https://www.youtube.com/channel/UCPMhsGX1A6aPmpFPRWJUkag"
]
# QueryYoutube(queriesL, True) #should download that channel
QueryYoutube(queriesL, False) #should download that channel
def run():
##### DOWNLOADING
#nightcore switching vocals channels
queriesL = ["https://www.youtube.com/channel/UCPtWGnX3cr6fLLB1AAohynw",
"https://www.youtube.com/channel/UCPMhsGX1A6aPmpFPRWJUkag",
"https://www.youtube.com/channel/UCl2fdq_CzdrDhauV85aXQDQ",
"https://www.youtube.com/channel/UC8Y2KrSAhAl1-1hqBGLBdzA",
"https://www.youtube.com/channel/UCJsX7vcaCUdPOcooysql1Uw",
"https://www.youtube.com/channel/UCtY3IhWM6UOlMBoUG-cNQyQ",
"https://www.youtube.com/channel/UCNOymlVIxfFW0mVmZiNq6DA"
]
QueryYoutube(queriesL, False)
if __name__ == "__main__":
run()
Data Cleaning & Automating Download of Original Songs
I’m taking the title of the youtube switching vocal video and using regx to find the names of the original song.
regex expression crafting
After converting the unicode to regular punctuation, I used a regex expression tester to zero in on the key words. I still have to remove some strings which got included in accidentally because I wanted to make sure I kept the artist names in the string groups matched.
Get the proper song names
I queried youtube according to the youtube_dl documentation (as the APIs for songs might not have the song I’m looking for, and I’m searching on youtube for the downloads).
To clean up the data and prevent duplicates of the original songs, I’m removing artist names and tags related to a song already noted down.
To look through the json for the metadata related to the youtube video, I used an online JSON Viewer:
Looking through Json Data
I made some test cases to check whether it works. I’m not using assert here because I don’t want the program to stop whenever a unit test fails.
Solutions (Tackled in script C_FixMissedDownloads.py:)
Check number of original videos downloaded (should correspond to number of original songs)
Filter out incorrect number of original songs (e.g. Remove Folders not containing an “Original_*” video)
Filter out if videos are too long (possibly not a song)
#TODO Check similarity of song videos against switching vocals (should have some similar parts)
TODO:
Sort Folders based on whether there are multiple songs contributing to the final video
Make model to learn audio switchingVocals transformation for the original song
Parts left:
Exploration / Transformation : Figure out how I want to represent the songs as input into the neural network, the score for the neural network’s output should represent the similarity against the original video, learn-to-hash?
Training : I currently want to test out using self-learning (GAN style). So I’ll train a discriminator using previous generation samples of the NN and the actual video to label with score whether it is an actual good mashup and let train it like a generative adversarial network
Testing : Once the GAN is pretty good, I’ll test against mashups it has never heard before.
Try out video NNs
Implementation + new avenue to explore : I’ll post some mashups to youtube~ and see the number of likes and dislikes a video gets per view -> train the network to produce mashups that are more liked per view?
Things to improve efficiency:
Memory storage
prevent duplicate video files by storing all video files in common folder and just using the file path as a reference to the video.
I’m going to start by writing a script to make a list of mashups and find the song names from the title.
Once I’ve verified it works, I’ll leave it to download the videos in a folder structure that looks like:
Mashup Folder
Songs Folder
Mashup Folder
json of links
I think I can find mashups of the same few songs so there will be more possible mashups for a specific song selection.
Steps
Update dependencies
Make the Script
Find list of mashups,
Find song titles from mashup titles
Download Mashup and related songs in a correct folder structure
Download more mashups of related songs
Update Dependencies
Make sure youtube_dl is installed / up to date.
brew install youtube-dl
Writing the Code
So the plan is to make modular functions I can test out cause you should always TEST YOUR CODE 🙂
Functions I need
QueryYoutube: Get the list of results from queries and put it in a text file
Input = count of songs for each query & queries to use
GetSongNamesFromMashup: Takes a text file and goes through each line (which is a mashup name) and generates a json file where the dictionary has mashup name: songName1, songName2…)
Possible issue where a song has a ton of songNames like those 50 songs mashups
will probably use a regex
DownloadAll: Takes a json file and downloads all the songs & videos in the folder structure as mentioned above. Put the links in the folder structure in a json file as well.
This is a small project to test some self learning concepts I’ve read about and for fun 😀
What I’m going to do
I’m going to train a neural network to take in 2 songs and generate an audio mashup from that.
I’m going to compare that to training a neural network to take in 2 videos and generating a video mashup of that. The problem with this is that the data would be a bit of a mess because mashup videos on youtube seem to take footage from other sources instead of the song video.
If I do transfer learning of the audio mashup NN for the video mashup NN, that should be more effective right? But the audio and video should be correlated…
Steps
Data Collection & Storage : I’ll use youtube_dl to make a script to download mashups and then from the title of the mashup, get the name of the 2 songs and download them too. -> will use a folder structure (mashupName > mashup folder + songs folder)
Data Cleaning : Going over the data to make sure the mashups are actually mashups of the songs I’ve collected
Exploration / Transformation : Figure out how I want to represent the songs as input into the neural network, the score for the neural network’s output should represent the similarity against the original video, learn-to-hash?
Training : I currently want to test out using self-learning (GAN style). So I’ll train a discriminator using previous generation samples of the NN and the actual video to label with score whether it is an actual good mashup and let train it like a generative adversarial network
Testing : Once the GAN is pretty good, I’ll test against mashups it has never heard before.
Try out video NNs
Implementation + new avenue to explore : I’ll post some mashups to youtube~ and see the number of likes and dislikes a video gets per view -> train the network to produce mashups that are more liked per view?
I’m exploring OpenDrift which is an open-source framework for ocean trajectory modelling.
Goal: What I want to do
Train a neural network to predict the trajectory of the entire swarm by giving it the inputs of the readers and verifying against the bounds of the swarm.
I will be modifying an example to learn more about OpenDrift before I make my own model / classes.
Starting: First Attempt
I’m working off example_drifter.py and modifying it to work with different parameters so I don’t train the model to only predict one result (over-fitting).
To get a better understanding of the models and readers used by example_drifter.py I changed the script as follows:
Changed the loglevel so I only get important info
Check variables required by the OceanDrift Model and default (fallback) values
o = OceanDrift(loglevel=20) # 0 is debug mode which shows a lot of info, 20 is important info, 50 is no info
print(OceanDrift.required_variables)
print(OceanDrift.fallback_values)
Inspect the readers
print(reader_current)
print(reader_wind)
Here is the output:
Reader Data
===========================
Reader: /Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/norkyst800_subset_16Nov2015.nc
Projection:Â
 +proj=stere +lat_0=90 +lon_0=70 +lat_ts=60 +units=m +a=6.371e+06 +e=0 +no_defs
Coverage: [m]
 xmin: -2952800.000000  xmax: -2712800.000000  step: 800  numx: 301
 ymin: -1384000.000000  ymax: -1224000.000000  step: 800  numy: 201
 Corners (lon, lat):
  ( 2.52, 59.90) ( 4.28, 61.89)
  ( 5.11, 59.32) ( 7.03, 61.26)
Vertical levels [m]:Â
 [-0.0]
Available time range:
 start: 2015-11-16 00:00:00  end: 2015-11-18 18:00:00  step: 1:00:00
  67 times (0 missing)
Variables:
 time
 x_sea_water_velocity
 y_sea_water_velocity
Reader Data
===========================
Reader: /Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/norkyst800_subset_16Nov2015.nc
Projection:Â
 +proj=stere +lat_0=90 +lon_0=70 +lat_ts=60 +units=m +a=6.371e+06 +e=0 +no_defs
Coverage: [m]
 xmin: -2952800.000000  xmax: -2712800.000000  step: 800  numx: 301
 ymin: -1384000.000000  ymax: -1224000.000000  step: 800  numy: 201
 Corners (lon, lat):
  ( 2.52, 59.90) ( 4.28, 61.89)
  ( 5.11, 59.32) ( 7.03, 61.26)
Vertical levels [m]:Â
 [-0.0]
Available time range:
 start: 2015-11-16 00:00:00  end: 2015-11-18 18:00:00  step: 1:00:00
  67 times (0 missing)
Variables:
 time
 x_sea_water_velocity
 y_sea_water_velocity
Reader performance:
--------------------
/Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/norkyst800_subset_16Nov2015.nc
 0:00:09.5 total
 0:00:00.1 preparing
 0:00:00.4 reading
 0:00:00.2 interpolation
 0:00:00.1 interpolation_time
 0:00:08.6 rotating vectors
 0:00:00.0 masking
--------------------
/Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/arome_subset_16Nov2015.nc
 0:00:09.3 total
 0:00:00.1 preparing
 0:00:00.3 reading
 0:00:00.1 interpolation
 0:00:00.1 interpolation_time
 0:00:08.6 rotating vectors
 0:00:00.0 masking
--------------------
global_landmask
 0:00:00.9 total
 0:00:00.0 preparing
 0:00:00.8 reading
 0:00:00.0 interpolation_time
 0:00:00.0 masking
--------------------
Performance:
  25.8 total time
  0.1 configuration
  3.2 preparing main loop
   3.1 making dynamical landmask
   0.0 moving elements to ocean
   20.7 readers
    0.9 global_landmask
    0.3 postprocessing
  21.9 main loop
    9.6 /Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/norkyst800_subset_16Nov2015.nc
    9.4 /Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/arome_subset_16Nov2015.nc
   0.8 updating elements
  0.4 cleaning up
--------------------
Display properties of seeded elements
print(o.elements_scheduled)
Check which properties of the model can be configured
print(o.list_configspec())
I also printed the model instance
print(o)
Model: OceanDrift   (OpenDrift version 1.1.0rc2)
43 active PassiveTracer particles (1957 deactivated, 0 scheduled)
Projection: +proj=stere +lat_0=90 +lon_0=70 +lat_ts=60 +units=m +a=6.371e+06 +e=0 +no_defs
-------------------
Environment variables:
 -----
 x_sea_water_velocity
 y_sea_water_velocity
   1) /Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site-packages/tests/test_data/16Nov2015_NorKyst_z_surface/norkyst800_subset_16Nov2015.nc
 -----
 x_wind
 y_wind
   1) /Users/Caffae/miniconda3/envs/opendrift_p3/lib/python3.7/site- packages/tests/test_data/16Nov2015_NorKyst_z_surface/arome_subset_16Nov2015.nc
 -----
 land_binary_mask
   1) global_landmask
Time:
Start: 2015-11-16 00:00:00
Present: 2015-11-18 18:00:00
Calculation steps: 264 * 0:15:00 - total time: 2 days, 18:00:00
Output steps: 67 * 1:00:00
===========================