Generating JSON from alignment.

· 09 Jun 2018

So this is the end of GSOC 2018 third week ,Next week there is evaluation. These three weeks has been great ,I got so many things to learn from my mentor Robert Ochshorn.

Here in this post we will learn about creating JSON file from the alignment process ,In this process we will use the repository that I made you can clone it here

At this point if you have any question about why are we creating files in JSON format please see my other post here

This repository will be named Newrepo-master.I have used the name Newrepo in this post in place of Newrepo-master . Newrepo-master or Newrepo means the same.
This repository include

MFATestWav - An empty directory used in alignment process.
Models - Directory used to download acoustic model.
output - Folder Used to store output from Alignment Process.
outputcsv - Folder Used to store CSV output from TextGrid.
outputjson - Folder Used to store the final JSON output from the alignment process.
testfiles - Folder contains sample audio and transcript file for alignment.
align.py - Alignment script .
textgrid2csv.py - Python script used to derive csv from textgrid.
csv2json.py - Python script used to make JSON from CSV.
spanishdict.txt - Spanish dictionary used in alignment.
install_models.sh - shell script to install models.

Please make sure directories are named similar to what I have given above and the directories MFATestWav , Models ,output , outputcsv , outputjson are empty before aligning , it has .gitignore files in all these directory , delete this .gitignore files . I have only added them in this directory because we can’t upload empty directory in git repository, it won’t be tracked.

So let’s start with the process,after you forked the repository you will see you have above mentioned folder, please make sure you have all the necessary folder in repository.

Further your system should have python version greater than 3.x. Some script mentioned does not support version less than 3.x.

Below given process are needed for checking requirements and fetching models. There are two process mentioned here

1) Download Linux release mentioned in step 2.(Recommended)

2) Building from source mentioned in step 6.

Using Linux Release.

1) Since Montreal-Forced-Aligner is build on top of kaldi . Kaldi should be compiled first.Detailed steps are mentioned in my post here

2) Download the latest release of Montreal Force Aligner from here You will see following options when you visit this website

      montreal-forced-aligner_linux.tar.gz  93 MB
      montreal-forced-aligner_macosx.zip    45 MB
      montreal-forced-aligner_win64.zip     43.5MB
      Source code (zip)
      Source code (tar.gz)

If you are using linux then click the montreal-forced-aligner_linux.tar.gz .When you unzip this files , one folder by the name montreal-forced-aligner will be created .This directory should look like below.

   kranti@kranti:~$ tree -d montreal-forced-aligner
   montreal-forced-aligner
   ├── bin
   ├── lib
   │   └── thirdparty
   │       └── bin
   └── pretrained_models

change the name of montreal-forced-aligner to MFA it should look like this.

    kranti@kranti:~$ tree -d MFA
    MFA
    ├── bin
    ├── lib
    │   └── thirdparty
    │       └── bin
    └── pretrained_models

3) Now, after renaming add this(MFA folder) to the Newrepo(repository cloned from github)

It should look like this

If you want to view detailed file view in directory then view this

4) After doing above process check this command.Change directory to Newrepo/MFA and then check the command

      kranti@kranti:~/Desktop/Newrepo$ cd MFA     
      
      kranti@kranti:~/Desktop/Newrepo/MFA$ bin/mfa_align
        usage: mfa_align [-h] [-s SPEAKER_CHARACTERS] [-t TEMP_DIRECTORY]
                            [-j NUM_JOBS] [-v] [-n] [-c] [-d] [-e] [-i]
                           corpus_directory dictionary_path acoustic_model_path
                           output_directory
        mfa_align: error: the following arguments are required: corpus_directory
        , dictionary_path, acoustic_model_path, output_directory

 If this usage message is not printed then there is problem in your installation
 process. Follow from the 6th step.

5) Now it is time to download Spanish model using the shell script.

      kranti@kranti:~/Desktop/Newrepo$ ./install_models.sh
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100 14.0M  100 14.0M    0     0  40463      0  0:06:02  0:06:02 --:--:-- 60683

If above script doesn’t work use this site to download the model. Put this file as spanish.zip in Models folder.

If you complete this five steps successfully then no need to follow from 6th step.Go directly to 9th step. Now only option if this(Above five steps) doesn’t work is building from source

Building from source (7-8th step)

6) If downloading montreal-forced-aligner_linux.tar.gz does not work download the source code of Montreal-Forced-Aligner from github here

7) After doing 1st step and 6th step follow this steps.

a) Open a terminal and go to the unzipped folder

            cd /path/to/Montreal-Forced-Aligner/thirdparty.

b) Run the thirdparty/kaldibinaries.py script, pointing it to where Kaldi
   was built

       python thirdparty/kaldibinaries.py /path/to/kaldi/root

c) Run

               pip install -r requirements.txt

   to install the requirements for the aligner.

d) Run the build script via

        freezing/freeze.sh

  There will now be a montreal-forced-aligner folder in the dist folder. This folder should
   contain a bin folder with the two executables mfa_align and
   mfa_train_and_align that should be used for alignment.  

  After doing this do the 5th step and then 9th step.   

8) More information on installation is mentioned on this site

I have also made blog post on this

NOTE

   Models should not be unzipped.It is downloaded as spanish.zip and keep
   is as it is in Models folder.Names are case sensititve.

9) Now is the time to do alignment,Navigate to Newrepo(Newrepo-master) and type this in terminal.

                  python3 align.py testfiles/64.wav testfiles/64.lab

   Result can be viewed in above gist.

JSON output can be viewed in outputjson.CSV output can be viewed in outputcsv. Usually I recommend using Linux release(This is method I followed) . We do not need to compile any binaries in this we have to use as it is given montreal- forced-aligner_linux.tar.gz .But when building from source ,it would lead you to error such as ‘fstcompile’ not found .This file is included in Linux Release any many other files you may have to include in the source file as you face any errors. And some errors are included in common errors post

The alignment process used here is for alignment of Spanish data. You can do alignment process with your Spanish audio files . However do keep this few things in mind before testing files

Your files should have sample rate above 16000Hz or at least 16000Hz.
You can convert the sampling rate like this

      ffmpeg -i a.wav -ar 16000 b.wav

where a.wav is you sample file for changing sampling rate and b.wav is resultant file we want for alignment purpose.

I assume you have computerized ffmpeg.
The audio format should be .wav not .mp3 or any other.To convert the audio format from .mp3 or any other format to .wav use the following command

      ffmpeg -i c.mp3 -ar 16000 d.wav

After converting desired files.Next thing make sure you have transcript files for test audio files.
You can do alignment in any language you just have to modify scripts in install_models.sh to download desired models.You would also need dictionary for that language and it should be in GlobalPhone format. Make changes accordingly in align.py to direct it to the model you have downloaded.
Dictionary can be made using g2p models . information on this is mentioned in my this post
I also tried to clone my repository Newrepo from github in order to check if it works .Here you can view alignment results.