Table of Contents
Here are the ways to calculate MFCC with HTK and SPTK.
HTK
For calculating MFCC with HTK, you can use HCopy and HList, which are the tools in HTK. In my case, only HList is needed.
The above tools can calculate 39-dimentional MFCC. More strictly, MFCC, Power, ΔMFCC, ΔPower, ΔΔMFCC and ΔΔPower.
config
When you calculate MFCC from wav with HTK, you can set basic parameters by config
file. Of course, you can use HCopy and HList without config file.
The following script is the example of config
file.
1 2 3 4 5 6 7 8 9 10 |
SOURCEFORMAT = NOHEAD SOURCEKIND = WAVEFORM SOURCERATE = 625 TARGETKIND = MFCC_0_D_A TARGETRATE = 100000.0 WINDOWSIZE = 250000.0 USEHAMMING = T PREEMCOEF = 0.97 NUMCHANS = 24 NUMCEPS = 12 |
HCopy
HCopy
can calculate MFCC from Wave(.wav) file. The calculated values will be written in a file.
1 |
HCopy -C config.txt sample.wav sample.mfc |
The above code creates a file contains MFCC data, sample.mfc
. It is a binary file of HTK special format.
HList
HList
can extract and display the MFCC data in the file which is created by HCopy
. And, HList
can calculate MFCC from Wave(.wav) file alone.
- Display MFCC File
The way to display mfcc values in a file which is created by
HCopy
.1HList -r sample.mfcThe values are output onto stardard output. If you want to save in a file, use redirection.
- Calculate and Display MFCC
-
1HList -C config.txt -r sample.wav
config.txt
is the same as one in the case ofHCopy
.
Parameters (often used)
- -s N
Start listing samples from sample index N. Default value is 0.
- -e N
End listing samples at sample index N. Default value is 0.
You can view all option by executing HList
without any argument.
SPTK
SPTK can also calculate 12-dimentional MFCC from Wave(.wav) file. mfcc
command can calculate MFCC and Power. delta
command can calculate delta but I’ve not tried.
The sample code without option values are as follows.
1 2 |
wav2raw sample.wav wav +sf < sample.wav.raw | frame | mfcc |
Thus, the file of MFCC is created. The format of the file is binary, and float
sequence.
When you get MFCC values by C++, the following code will help. (It uses Qt, but it’s not necessary. The point is the line of fread
.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
FILE *pfp = popen("wav2raw sample.wav; x2x +sf < sample.wav.raw | freme | mfcc".c_str(), "r"); QVector<QVector<float>> lines; float d[12]; while (fread(d, sizeof(float), 12, pfp) != 0) { QVector<float> values; for (int i = 0; i < 12; ++i) { values.append(d[i]); } lines.append(values); } pclose(pfp); |
If you use Python, struct.unpack
can extract MFCC values.