Final post

I ended up generating something not bad…

Of course it faded away but still 🙂

Again, the learning curve and the spectrogram:

Final learning curve

The step in the train curve is due to a change of training subset (I don’t compute the cost on the whole training data) when I restarted the learning.

Final

The seed ended at one second and we can see that it generates something «good» for two seconds (from 1 to 3) before getting stuck in a fixed point. Its interesting since I propagated the gradient for two seconds during the training.

Final post

The last day

Using lasagne I was able to produce something more interesting with a small (200 to 100 to 100 to 200) LSTM.  Thanks to Christopher Beckham’s blog and repo for helping me understand the RNN part of lasagne in about 2h. Everything that I do in lasagne can be found in the «Last try» folder in my repo.

Its just one long note but its the best that I generated so far. Let’s take a look at the learning curves and the spectrogram:

One long note learning curve

One long note

Here is the seed and the generated sequence. We can see that the model get the highest frequency and produce it. The problem is that it got stuck in a fix point and produces the same thing over and over.

I have now an idea on why my previous models didn’t work. I think it has to do with the way I generate my batches. I used to take my batches from the data set completely at random and now (following what Christopher did) to generate a batch I take one element at random and I complete the batch by shifting it. I think it’s important since by doing this the model can learn more easily to be equivariant to shifts (if it receive a shifted input it will produce a shifted output) and that is a property we want.

I’m training a larger model right now and at the time I don’t think it will reach a small enough mean square error. My last post will probably be on this model.

The last day

Last week

I’m still not able to produce a long sequence of sound…

You can see here what is the best sound I generate so far

I had no time to test in the frequency domain, I spend all my time training a lot of different models in the time domain and nothing worked out… I tried the following too see where is the problem:

  • Check if the non-linearities were saturated
  • Check if the way I generated the batches was fine
  • Check if my implementation of RMSprop was doing the same thing as other implementations
  • Check and double-check if everything is well linked in my implementations of LSTM and GRU

And everything seems fine.

I tried a lot of things:

  • Used GRU and LSTM
  • Varying the number of layers from 1 up to 5
  • Varying the number of hidden units from 200 up to 2000
  • Varying inputs length from 200 up to 32000
  • Varying the sequence length from 6 up to 60
  • etc…

which end up producing three different things:

  1. Nothing (no sound)
  2. Noise (like the end of what is shown here but louder)
  3. One note and then noise… (which is the best I got)

As presented in a previous post, the strategy was always the same; train the model to predict what’s next and then, using a seed, use the predictor as a generator.

Here is the spectrogram of the seed concatenated to what I generated. You can easily guess where the generation started…

One note

Even though it’s really bad and unimpressive it still menages to get some frequency right before fading away. The model which does it is a three layers GRU with no end layer (the top hidden cell is directly use in the cost). Maybe the model is just not train enough as shown by the learning curve (on the valid set) here:

One note learning curve

which seems to still go down, just really, really slowly. It took one whole day to train this model on a GPU.

I really don’t know where it goes wrong and if I had to restart it over I’ll definitely use a library like lasagne or blocks. It was probably not a great idea the rebuild the wheel to solve directly this task but I will for sure make it work this summer by first testing my code on a standard data set like TIMIT. I should have realize earlier that I wouldn’t have the time to make it work and start using a library but I didn’t.

Tomorrow I will try lasagne and hopefully produce something.

Last week