32leaves.net

Generating syntax diagrams from Scala parser combinators

Scala parser combinators are a powerful concept which allow you to write parsers with the standard Scala distribution. They’re implemented as an internal DSL which makes them feel naturally in the language. As mentioned before I’m currently writing some sort of a state machine compiler. As I plan to write some documentation for that software (which actually is kind of rare :-P) I need some way to describe the syntax. Besides EBNF, syntax diagrams are a neat way to do that. The external DSL which is used to describe the state machines is directly written using the parser combinator, so I had figure out a way to generate syntax diagrams as well as EBNF out of them.
My first approach was to pattern match the resulting Parser structure. But unfortunately the combinators themselves are mostly modeled as a function, thus do not appear in the resulting object graph. So I decided to write a Scala compiler plugin and run thru the AST. Basically all this software does is to transform trees from the AST to some intermediate expression structure to a graphing tree one. This approach suffers some limitations and I made some assumptions as well:

  • The name of the class which contains the production rules must equal the filename without “.scala”
  • Production rules must not be named: “stringWrapper”, “literal”, “regex”, “Predef”, “r” or “accept”
  • def production rules must not have parameters, otherwise they’re not considered to be production rules
  • production rules defined via val will appear twice (this is most likely a bug)

All in all this implementation works in my case, but might not work for others. This more of a quick hack than carefully engineered software, but it proofs the concept and gets the job done.

With the ability to draw syntax diagrams and having the expression tree structure at hand (which can be seen as an AST for EBNF) writing an EBNF parser (which is semi ISO/IEC 14977 : 1996(E) compatible) and feeding the diagram generator with the result was a snap.
So let’s consider the following example (which actually is the EBNF parser I wrote):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def start: Parser[List[ProductionRule]] = syntax_rule*
def syntax_rule = meta_identifier ~ '=' ~ definitions_list ~ ';' ^^ {
  case name ~ _ ~ expr ~ _ => ProductionRule(name, expr)
}
def meta_identifier = """[a-zA-Z0-9]+"""r
def definitions_list:Parser[Expression] = (single_definition ~ (('|' ~ single_definition)*)) ^^ {
  case t ~ o => OrExpression(t :: o.map(_ match { case _ ~ x => x }))
}
def single_definition = (primary ~ (((","?) ~ primary)*)) ^^ {
  case t ~ o => SeqExpression(t :: o.map(_ match { case _ ~ x => x }))
}
def primary: Parser[Expression] = optional_sequence | repeated_sequence | grouped_sequence | terminal_string | meta_identifier ^^ {
  case s => RuleRefExpression(s)
}
def optional_sequence = "[" ~ definitions_list ~ "]" ^^ {
  case _ ~ os ~ _ => OptionExpression(os)
}
def repeated_sequence = "{" ~ definitions_list ~ "}" ^^ {
  case _ ~ rs ~ _ => ZeroToManyExpression(rs)
}
def grouped_sequence = "(" ~ definitions_list ~ ")" ^^ {
  case _ ~ gs ~ _ => gs
}
def terminal_string = (""""[^"\\\r\n]*(?:\\.[^"\\\r\n]*)*" """r) ^^ {
  case s => LiteralExpression(s)
}

The corresponding EBNF expression generated by this software is:

1
2
3
4
5
6
7
8
9
10
start = { syntax_rule };
syntax_rule = ( '[a-zA-Z0-9]+' '=' ( single_definition { ( '|' single_definition ) } ) ';' );
meta_identifier = '[a-zA-Z0-9]+';
definitions_list = ( ( primary { ( [ ',' ] primary ) } ) { ( '|' single_definition ) } );
single_definition = ( ( optional_sequence | repeated_sequence | grouped_sequence | terminal_string | meta_identifier ) { ( [ ',' ] primary ) } );
primary = ( ( '[' definitions_list ']' ) | ( '{' definitions_list '}' ) | ( '(' definitions_list ')' ) | '"[^"\\\r\n]*(?:\\.[^"\\\r\n]*)*" ' | '[a-zA-Z0-9]+' );
optional_sequence = ( '[' ( single_definition { ( '|' single_definition ) } ) ']' );
repeated_sequence = ( '{' ( single_definition { ( '|' single_definition ) } ) '}' );
grouped_sequence = ( '(' ( single_definition { ( '|' single_definition ) } ) ')' );
terminal_string = '"[^"\\\r\n]*(?:\\.[^"\\\r\n]*)*" ';

And of course the syntax diagrams we wanted:

So far the syntax diagram generator understands two options:

  • -P:virtual:rule01,...,ruleN: declares rule01,…,ruleN to be virtual rules which causes them to be resolved wherever they’re referenced. Be aware that recursive virtual rules are only resolved to one level
  • -P:depth:N: resolve rule references until a depth of N (recursive rules are only resolved for one level)
If you’re interested in the code you can grab it from this SVN repository (username: guest, password: guest). You’ll need Scala as well as Batik to run it.
Creative Commons License
SyntaxDiagramGenerator is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License

Controlling a robot arm using an ez430 Chronos

A few days ago the ez430 Chronos I ordered from Farnell was delivered. This particular piece of hardware is an evaluation platform for the MSP430 (or the MSP430 as a CC430F6137 SoC to be more precise) which comes as a sports watch. It’s got some pretty neat features such as a temperature sensor, accelerometers on all three axis (which is why I’m using it here) and an RF transceiver built in. The first thing I did to it was to enable the watch to display the current time binary encoded.
After getting this to work (which was not that hard at all) I moved on to trying to get some data using the RF link. In the ez430 Chronos Wiki someone posted a Python program which polls the RF hub for accelerometer data. Having a Mac and not wanting to spend hours on getting pyserial to work, I wrote my own version in Ruby.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
require 'rubygems'

# on my machine a simple require does not work for this library
Kernel.require 'serialport'


SERIAL = SerialPort.new ARGV[0], 115200
SERIAL.read_timeout = 1000
def serial(data); data.each {|c| SERIAL.putc(c) }; SERIAL.flush end

# start access point
serial([0xFF, 0x07, 0x03])

while(true)
  # request the data
  serial([0xFF, 0x08, 0x07, 0x00, 0x00, 0x00, 0x00])

  accel = SERIAL.read(7)
  unless accel.nil?
    accel = accel.scan(/./).map {|c| c[0] }

    if accel[3] == 1
      puts accel[4..6].join(" ")
      STDOUT.flush
    end
  end
end

SERIAL.close

That’s pretty nice, but I wanted some real time graphs of the data, so a half an hour later I got a short Processing sketch running which does exactly that.

Next semester I’ll participate in a project in which we’ll built a new hardware interface to some pretty old robot arm hardware (it states “Made in West Germany”, that’s quite some time ago). In order to do that we need some sort of protocol to communicate with the new hardware interface. In preparation for that project I wrote a simulator for the robot (actually that’s supposed to be a part of the project, but I simply couldn’t resist) which uses an early draft of that particular protocol. Just to have that mentioned, the protocol itself is described as a state machine in a DSL which can be compiled to virtually any programming language. I’ll probably write a post about that nice piece of Scala code at some later time.
So with a working robot arm simulator, a watch which has accelerometers built in and is capable of wireless communication what else could I’ve done than to combine those two. That said I got to work and enhanced the script above to communicate with the simulator. About an hour later I got the first movements controlled by the accelerometer data of the watch. You might have seen the video already, if not go and watch it :-)
Update: We had some spare time to spent and did some really basic proof of concept with the real hardware. Here we go:

Readonly hardware IRC client

Right after we got the network running on my board (see my previous post) I got the board of the talented hacker mentioned before and started to hack a small read-only IRC client for it (after having a look at RFC1459 and RFC2812. About an hour later I was able to connect to a server, join a channel and listen for messages which are displayed on the OLED display. Unfortunately the display is broken on that particular board. But none the less, there is a lot of cool stuff ahead.
Again we have some screenshots:

Getting the network of my LM3S6965 board to work

I’ve had this board for about a 1.5 years. All that time I was disliked the fact that the ethernet was not working. No matter what I tried, what example I flashed nothing helped. Today I learned why: the hardware’s simply broken.
That’s no guess too far away, but being a software guy that’s just an unreachable world to me.
Thankfully that’s not the case for a friend of mine. He’s a talented hacker (from today on I know how talented :P) and he got this fixed. All the time I was like: “hey, if that’s so easy someone else would have done it already instead of buying that super-expensive Pulse Jack Ethernet Transformer hardware”. None the less he pushed forward and got it to work. As usually pictures say more than a thousand words, so here we go:

Raytracing: second shot

Alright, this is propably the last shot I’m taking on raytracing. I implemented some pretty neat features. First of all: reflection:
rawreflection_0

The left picture shows the image without reflection, the right one with reflection enabled. Actually reflection is pretty easy to implement. All we need to do is recursivly trace the reflected rays.

The second feature is also pretty neat: anti-aliasing. Actually that also is not too difficiult to implement. I implemented the most simple way of doing anti-aliasing which is supersampling. The following picture shows the difference between un-aliased and aliased rendering:
reflection_comparison

To get a better impression on what anti-aliasing does, the following three pictures show the same image with no antialising, 2 times, 4 times and 16 times anti-aliasing:
reflection_0reflection_2reflection_4reflection_16

Of course the changes are in the SVN. The username, as well as the password, is guest

Raytracing: my first shot

After revisiting the Google talk about splines and fractals I had the idea to do this in three-dimensional space by using quaternions instead of complex numbers. It turned out that it wasn’t utterly difficult to render a simple three-dimensional Sierpinski gasket (or Menger-Sponge) using PovRay. At least installing PovRay on my Mac took more time than writting the script.
mengerSierpinski
While fuzzing around with PovRay a crazy idea came to my mind: why not write my own ray tracer. Of course that would only be the simplest possible as I only wanted to get a rough understanding on how this stuff works. After doing a bit of research it turned out that it shouldn’t be a too difficult task. Hey, this is one of the excercises in every “Computer graphics basics” lecture.
Three hours later I had a rough shot, but somehow my maths for calculating the rays was wrong. The day after, a friend of mine and myself went to a maths professor at my University and asked him if he could have a quick look at the formulas. His explanations were (as always) very insightful and simply brilliant. With the ideas he gave us we went on to get the missing bits done. And 4 more hours later we had a working ray tracer. (The sierpinski gasket was not rendered by my ray tracer, but I can’t figure out how to exclude images from the WordPress gallery).
Of course after 7 hours of work the ray tracer is far from being perfect. There are a few neat features missing (which might follow) such as shadows, reflection and radiosity (which would really be a big task). Also the maths concerning the projection plane are not totaly right. We should rather introduce a coordinate system transformation to get this done than hacking things together as it is right now. All in all it was a lot of fun to write that thing. If you want to have a look at the source code, check it out here (username: guest, password: guest).

Note to myself: tail recursion in Haskell does not allow me to stop thinking

We’ve propably all heard that tail recursion is the way to go. That’s true, but as I’ve learned today applying this principle does not free you from thinking about your code.
The task was to implement the Josephus game. That’s particularly straight forward in imperative languages such as Java:

1
2
3
4
5
6
7
8
9
10
11
12
13
private int josephus(int amountOfSoldiers, int interval) {
    List<Integer> soldiers = new LinkedList<Integer>();
    for(int i = 1; i <= amountOfSoldiers; i++) soldiers.add(i);
   
    int index = 1;
    while(soldiers.size() != 1) {
        for(Iterator<Integer> iter = soldiers.iterator(); iter.hasNext() && iter.next() != null; ) {
            if(index++ % 3 == 0) iter.remove();
        }
    }
   
    return soldiers.get(0);
}
But Java is not much of a challange, so lets give this a shot in Haskell:

1
2
3
4
5
6
7
8
9
10
11
12
13
josephus :: [Int] -> Int
josephus []  = 0
josephus [x] = x
josephus l   = josephus_ l 1 where
    josephus_ [x] _ = x
    josephus_ xs o  = let
        len = length xs
        off = ((o + len) `mod` 3) in
        josephus_ (dropByInject o xs) off

dropByInject _ []     = []
dropByInject m (x:xs) | m `mod` 3 == 0 = dropByInject (m + 1) xs
                      | otherwise      = x:(dropByInject (m + 1) xs)

Obviously the dropByInject function is not tail recursive. When we call the following program

1
main = putStrLn (show [josephus [1..l] | l <- [1..2000]])

which computes the “Josephus number” from 1 to 2000 soldiers, takes roughly 0.792s.

So I did a second shot which is tail recursive:

1
2
3
dropByInject r _ []     = r
dropByInject r m (x:xs) | m `mod` 3 == 0 = dropByInjectAp r (m + 1) xs
                        | otherwise      = dropByInjectAp (r ++ [x]) (m + 1) xs

. And here is where I stopped thinking – which immediately became clear to me as the same task as above takes 23.791s. Wow, that’s not really better than the 0.792s of the not-tail-recursive version. So why is that:
if we take a look at how the list is constructed in the otherwise part of the dropByInject function we see that the x is appended to the end of the list via

1
(r ++ [x])

. And that was the mistake – lists in Haskell are linked lists and appending to linked lists takes

    \[\mathcal{O}(n)\]

as we have to find the end of the list first. Upps

Alright, let’s go for a third shot:

1
2
3
dropByInject r _ []     = reverse r
dropByInject r m (x:xs) | m `mod` 3 == 0 = dropByInjectTr r (m + 1) xs
                        | otherwise      = dropByInjectTr (x:r) (m + 1) xs

This version takes only 0.704s. That’s not lightyears of speed gain compared to the first version, but hey at least it’s tail recursive.

How GCC optimizes recursion

As mentioned in my previous post I have been trying to implement the ESOM algorithm. As all my attempts to do so in C have utterly failed, I decided to give pure functional programming another try and learning Haskell in the progress (I highly recommend the awesome Real World Haskell book). One central concept of functional programming is recursion. In pure languages (such as Haskell) loops don’t even exist (and don’t have to).
We’ve all heard the rumors of recursion being slow, and if it’s used poorly that can be true. In FP there is a concept called tail recursion which the compiler recognizes and optimizes the code therefore. Now the question arose: how do rather traditional languages compete in this specific field. To gain insight on this topic a highly skilled friend of mine proposed the following procedure:
  1. Write some C code containing a simple recursive algorithm
  2. Compile the code using GCC (version 4.1.3) and different optimization levels
  3. Dump the compiled assembly to analyze the optimizations done

We ended up with decompiling the code using boomerang to get a more readable way of inspecting GCCs work. We’ll mark the code samples in this post accordingly – black background denotes decompiled code, white background denotes the handcrafted implementations.

Now we only have to choose a set of examples to examine. The ones we choose were

tail recursive factorial

The tail recursive implementation of the factorial is not as intuitive as the naive one, but still pretty easy to read (even in C):

1
2
3
int fac(int n, int res) {
    return n <= 0 ? res : fac(n - 1, res * n);
}

GCC does not optimize the recursion until O2 at which the code is transformed to an iterative version. Frankly that’s pretty much the behaviour one would expect from such a fairly advanced compiler such as GCC. In respect to what’s ahead it is worth noticing that the O3 optimized code still contains no recursion.

Factorial

The factorial can be implemented naivly in a pretty straight manner:

1
2
3
int fac(int n) {
    return n <= 0 ? 1 : n * fac(n - 1);
}

What does GCC make of it regarding recursion? At O0 and O1 optimization levels we see no direct change of the recursion. But the real hammer comes at O2the recurson is gone. And that is without our implementation being tail recursive.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// address: 0x8048380
fac(int param1) {
    int local1;         // r26{21}
    int local2;         // r24
    int local3;         // r26
    int local4;         // r28

    local2 = 1;
    local3 = param1;
    if (param1 > 0) {
        do {
            local1 = local3;
            local2 = local1 * local2;
            local3 = local1 - 1;
        } while (local1 != 1);
    }
    return local2; /* WARNING: Also returning: local3 */
}

So GCC is (to some extend) able to detect and optimize non-tail recursive functions. That really blew us off and is the main thing we’ve learned from these little experiments. As a matter of fact the little snippet above just looks exactly like the code generated at O2 for the tail-recursive factorial implementation.

fibonacci numbers

Besides the factorial, the fibonacci numbers are a quite famous example for recursion. But they differ greatly from the factorial in the way the effort to compute them (at least in the naive algorithm) grows exponentially.

1
2
3
4
5
int fib(int n) {
    if(n == 0) return 0;
    else if(n == 1) return 1;
    else return fib(n - 1) + fib(n - 2);
}

We can see in line 4 that there are two recursive calls to fib. That results in

    \[\mathcal{O}(2^n)\]

time complexity (the proof – using induction – is left as an excercise to the reader :P).

Yet again we’re heading for some suprise. At O0 and O1 we again see no change of the recursive pattern. But at O2 GCC has reduced the amount of recursive calls to one single call (altough it’s in a loop), thus reduced the complexity to PTIME. So the compiler has improved the algorithm itself and not just the way it’s implemented.

Ackermann function

The Ackermann function is the fastest-growing computable function currently known off. It’s behaviour is even worse than that of the fibonacci numbers. While the fibonacci numbers are tree recursive, the Ackermann function is nested tree recursive. Again the GCC’s optimization (at O2) is pretty close to magic (“Any sufficiently advanced technology is indistinguishable from magic.”Clarke’s third law). The code melts down to a single recursive call once again:

1
2
3
4
5
6
7
8
9
10
00000000 <ack>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 ec 08                sub    $0x8,%esp
   6:   c7 44 24 04 ff ff ff    movl   $0xffffffff,0x4(%esp)
   d:   ff
   e:   c7 04 24 00 00 00 00    movl   $0x0,(%esp)
  15:   e8 fc ff ff ff          call   16 <ack+0x16>
  1a:   eb ea                   jmp    6 <ack+0x6>
  1c:   8d 74 26 00             lea    0x0(%esi),%esi

We only find one single call to ack in line 8.

Conclusion

As we’ve seen, GCC does a pretty good job in dealing with recursion (if we compile our code with the appropriate optimization flags). Though this was a very experimental post, I do not want to leave out the theoretical background. I’m not going to cover it here, but only point out some papers which might be of interest for this topic. First of all SpringerLink is always a good place to look for information on topics like this one. Secondly, ACM is a good source, too.

  • [1366143] G. Stitt and J. Villarreal, "Recursion flattening," in GLSVLSI ’08: Proceedings of the 18th ACM Great Lakes symposium on VLSI, New York, NY, USA, 2008, pp. 131-134.
    @inproceedings{1366143,
      author = {Stitt, Greg and Villarreal, Jason},
      title = {Recursion flattening},
      booktitle = {GLSVLSI '08: Proceedings of the 18th ACM Great Lakes symposium on VLSI},
      year = {2008},
      isbn = {978-1-59593-999-9},
      pages = {131--134},
      location = {Orlando, Florida, USA},
      doi = {http://doi.acm.org/10.1145/1366110.1366143},
      publisher = {ACM},
      address = {New York, NY, USA},
      }
  • [888267] T. Hirschowitz, X. Leroy, and J. B. Wells, "Compilation of extended recursion in call-by-value functional languages," in PPDP ’03: Proceedings of the 5th ACM SIGPLAN international conference on Principles and practice of declaritive programming, New York, NY, USA, 2003, pp. 160-171.
    @inproceedings{888267,
      author = {Hirschowitz, Tom and Leroy, Xavier and Wells, J. B.},
      title = {Compilation of extended recursion in call-by-value functional languages},
      booktitle = {PPDP '03: Proceedings of the 5th ACM SIGPLAN international conference on Principles and practice of declaritive programming},
      year = {2003},
      isbn = {1-58113-705-2},
      pages = {160--171},
      location = {Uppsala, Sweden},
      doi = {http://doi.acm.org/10.1145/888251.888267},
      publisher = {ACM},
      address = {New York, NY, USA},
      }
  • [359344] M. A. Auslander and H. R. Strong, "Systematic recursion removal," Commun. ACM, vol. 21, iss. 2, pp. 127-134, 1978.
    @article{359344,
      author = {Auslander, M. A. and Strong, H. R.},
      title = {Systematic recursion removal},
      journal = {Commun. ACM},
      volume = {21},
      number = {2},
      year = {1978},
      issn = {0001-0782},
      pages = {127--134},
      doi = {http://doi.acm.org/10.1145/359340.359344},
      publisher = {ACM},
      address = {New York, NY, USA},
      }
  • [359630] R. S. Bird, "Notes on recursion elimination," Commun. ACM, vol. 20, iss. 6, pp. 434-439, 1977.
    @article{359630,
      author = {Bird, R. S.},
      title = {Notes on recursion elimination},
      journal = {Commun. ACM},
      volume = {20},
      number = {6},
      year = {1977},
      issn = {0001-0782},
      pages = {434--439},
      doi = {http://doi.acm.org/10.1145/359605.359630},
      publisher = {ACM},
      address = {New York, NY, USA},
      }

MFCC based feature vectors of my music collection

After reading a scriptum about neural networks I wanted to get my hands on them. So I wrote some rather simple implementations of Hopfield networks [1], multi-layer-perceptrons [2] and last but not least (serial) Self-Organizing Maps [3] (SOM).
As the Hopfield nets and multi-layer perceptrons worked fine with some artificial test data (hand-crafted patterns for the Hopfield nets and a simple

    \[\sin\]

function for the MLP), I needed some test data for the SOMs. The idea was to cartograph my iTunes library. So I used categorical data stored in the library as input vectors for the SOM. It turned out that the data was not suited for this purpose, as it was to evenly distributed and not significant enough to identify meaningful clusters on the resulting U-Matrix [4].

After searching for a more usefull way of extracting feature vectors from my music collection I eventually decided to go with so called “Mel frequency cepstral coefficients” [5].
According to Wikipedia the mel scale is a perceptual scale of pitches judged by listeners to be equal in distance from one another.(Mel scale [6]). Actually converting “normal” frequency into Mel frequency is pretty easy:

    \[f_{mel} = 2595 \cdot \log_{10}(\frac{f}{700})\]

.
I did not find a suitable implementation for calculating those MFCCs, so I wrote my own:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
/*
 * stdin: WAV, Mono-16bit-44100 mit Standard 48-byte WAV-Header
 * stdout: dito
 */

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <math.h>
#include <assert.h>
#include <fftw3.h>

/**
 * The range of the fiter bank in Hz
 */

#define F_MIN   20.0
#define F_MAX   11025.0

/**
 * Basically N is the window size in samples. As the WAV we're reading is supposed
 * to have a sample size of 44.1kHz and we want a window size of 0.1sec, so we set
 * this to 4410 (Hz).
 */

#define N       4410
#define N2_p1   2206
#define N2_m1   2204

#define SAMPLE_RATE         44100.0
#define SAMPLE_RATE_DIV_N   10.0

/**
 * The amount of filters in the MEL filter bank
 */

#define M_NUMCH 24
#define M_COEFF 20

double melScaleInverse(double f) {
    return 700 * (pow(10, f / 2595.0) - 1);
}

double melScale(double f) {
    return 2595 * log10(1 + (f / 700.0));
}

void buildFilterBank(double* filterBank) {
    int i = 0;
   
    double melDelta = (melScale(F_MAX) - melScale(F_MIN)) / (M_NUMCH + 1);
    for(i = 0; i < M_NUMCH; i++) {
        filterBank[i] = melScaleInverse(melDelta * i);
    }
}

double filter(double* f /*filterBank*/, int m /* filterIndex */, double f_k) {
   
    if(f_k < f[m - 1] || f_k >= f[m + 1]) {
        return 0;
    } else if(f[m - 1] <= f_k && f_k < f[m]) {
        return (f_k - f[m - 1]) / (f[m] - f[m - 1]);
    } else if(f[m] <= f_k && f_k < f[m + 1]) {
        return (f_k - f[m + 1]) / (f[m] - f[m + 1]);
    } else {
        return 0;
    }
}

void computeFFT() {
    int i, k, nread;
    uint16_t *iobuf;
    double *in;
    double ONE_DIV_N;
    double filterBank[M_NUMCH];
    fftw_complex *out;
    fftw_plan plan;
   

    ONE_DIV_N = 1.0 / N;
    iobuf = calloc(sizeof(uint16_t), N);
    in = (double*) fftw_malloc(sizeof(double) * N);
    out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    plan = fftw_plan_dft_r2c_1d(N, in, out,  FFTW_MEASURE);

    buildFilterBank(filterBank);

    /* pass thru the 48 byte wav-header */
    fread (iobuf, sizeof(uint16_t), 24, stdin);
   
    double c_ls[N2_m1];
    double c_mf[M_NUMCH];
    double c_mc[M_COEFF];
    while( (nread = fread(iobuf, sizeof(uint16_t), N, stdin)) == N ) {
        for(i = 0; i < N; i++) {
            in[i] = iobuf[i];
        }
       
        fftw_execute(plan);
       
        double result_re = 0;
        double result_im = 0;
       
        for(k = 0; k < N2_m1; k++) {
            c_ls[k] = sqrt(((out[k][0] / N) * (out[k][0] / N)) + (out[k][1] * out[k][1]));
        }
       
        for(i = 0; i < M_NUMCH; i++) {
            c_mf[i] = 0;
           
            for(k = 0; k < N2_m1; k++) {
                c_mf[i] += c_ls[k] * filter(filterBank, i, k * SAMPLE_RATE_DIV_N);
            }
           
        }
       
        for(k = 0; k < M_COEFF; k++) {
            c_mc[k] = 0;
           
            for(i = 0; i < M_NUMCH; i++) {
                if(c_mf[i] != 0) c_mc[k] += log(c_mf[i]) * cos(k * (M_PI / M_NUMCH) * (i - 0.5));
            }
        }
       
        fwrite(c_mc, sizeof(double), M_COEFF, stdout);
    }

    free(in);
    free(iobuf);
}


int main (int argc, char *argv[]) {
    computeFFT();
    return 0;
}
The implementation above reads raw WAV from the stdin and calculates the MFCCs from them. It’s called using

1
lame --decode -m m -t $filename | ./mfcc > $filename.features

and works as follows:

  1. Read the data using a rectangular 10ms window (usually a Hamming window [7] or similiar is used). This results in 4410 samples (16 bit wide) as the WAV is supposed to be sampled in 44.1kHz.
  2. Run FFT [8] on the acquired samples – here we use the FFTW [9] library. Then take the square of the transformation results: if

        \[Y_K\]

    is the

        \[K\]

    th transformation result we use

        \[\vert Y_K\vert^2\]

    .

  3. Scale the result of step 2 using the Mel scale filter bank. The one I used looks like this (this presentation [10] helped me alot for implementing the filter bank):
    filterbank We’ll call the Mel transformed value

        \[c^{mf}_i\]

    .

  4. Finally compute the

        \[n\]

    th coefficient using a discrete cosine transformation:

        \[c_n = \sum\limits_{i=0}^{N} \log_{10}(c^{mf}_i)\cdot\cos(\frac{n(2i-1)\pi}{2N})\]

    where

        \[N\]

    is the dimensionality of the resulting feature vector.

  5. Write the feature vector

        \[\vec{c} = (c_1, \dots, c_N)^T\]

    to the stdout as 8 byte wide little endian IEEE 754 double precision floats.

Running the programming above on my music collection (roughly 28 gigabytes) took roughly 24 hours and resulted in 487747560 feature vectors. That leaves me with to questions:
  1. How good is the quality of those feature vectors?
  2. How do I process such a vast amount of data?
In order to get an idea of the quality (in the context of feeding them into a SOM) I used some simple statistical methods. As the SOM estimates the propability density function of the samples, I had a closer look at the distribution of those vectors in the

    \[\mathbb{R}^20\]

.
The idea is to compute the mean vector using

    \[\vec{m} = \frac{1}{N}\sum\limits_{i=1}^N\vec{c}_n\]

, compute the eucledian distance (as this is the measure of distance a SOM implementation would use) of each vector

    \[\vec{c}_n\]

to

    \[\vec{m}\]

:

    \[d_i = \|\vec{m} - \vec{c}_i\|\]

. Finally we take the mean of the

    \[d_i\]

(

    \[d_m\]

) and calculate the

    \[\sigma_i = d_m - d_i\]

. Those

    \[\sigma_i\]

drawn plotted in a histogram look like this:histogram

The image above leaves us with two possible conclusions:
  1. The music I listen two is pretty much all the same
  2. The feature vectors are not really of good quality

Personally I’d go with the later one as my the music in my library is not that much the same. One way to improve the quality of those feature vectors could be use the first and second derivation of the cebstrum coefficients as proposed here [11] or to use a Hamming window for reading the data.

None the less, once I finally make it and finish a working batch SOM [12] (and this paper [13]) implementation I’ll do another post describing the results. Or I might simply abandon this project as the semester starts in a few days and I might run out of time.

References

Advanced Java generics for method arguments

Currently we’re having the semester break. During this time I work for a automotive software/hardware tool vendor. And while working there I stumbled upon a piece of code (wich was part of the API) which basically looked like this:

1
2
3
public void doSomething(ArrayList<String> list) {
    ...
}

The intention of the author was not that he wanted to have an ArrayList here, but he needed a List of String which was Serializable as well. For obvious reasons it’s not really a good idea to expect something as concrete as an ArrayList here, so there should be a better way – and there is:

1
2
3
4
public <SerializableStringList extends List<String> & Serializable>
void doSomething(SerializableStringList list) {
    ...
}

This piece of code is far better in expressing the authors intention, but still there are some drawbacks.
First of all it does not work for Java 1.4 (which might be a requirement).
The worst disadvantage is the lacking support in Eclipse. If you want to call this method, the only thing you’ll see in that little light-yellow description box is that a SerializableStringList is expected, but it won’t tell you that this is a generic type and not a real type like a class or an interface.

Still I’d prefer the later solution over the first one any time as the first one forces people to use classes they might not want to and the second solution is by far more expressive.

« Previous PageNext Page »
Fork me on GitHub