Analyzing a midi file

Load the midi data into dataframe

[1]:
import miditapyr

Let’s load the example midi file of the package:

[2]:
midi_file = miditapyr.get_test_midi_file(as_string=True)

Create a MidiFrames object:

[3]:
mf = miditapyr.MidiFrames(midi_file)

Let’s extract the unnested midi frame:

[4]:
df_midi = mf.midi_frame_unnested.df
[5]:
df_midi
[5]:
i_track meta type name time note velocity channel tempo numerator denominator clocks_per_click notated_32nd_notes_per_beat
0 0 True track_name drum-t1-1-t1 0 NaN NaN NaN NaN NaN NaN NaN NaN
1 0 False note_on NaN 0 43.0 72.0 9.0 NaN NaN NaN NaN NaN
2 0 False note_on NaN 0 39.0 64.0 9.0 NaN NaN NaN NaN NaN
3 0 False note_on NaN 0 36.0 101.0 9.0 NaN NaN NaN NaN NaN
4 0 True set_tempo NaN 0 NaN NaN NaN 666666.0 NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ...
263 2 False note_off NaN 31 59.0 57.0 15.0 NaN NaN NaN NaN NaN
264 2 False note_off NaN 9 67.0 57.0 15.0 NaN NaN NaN NaN NaN
265 2 False note_on NaN 266 62.0 82.0 15.0 NaN NaN NaN NaN NaN
266 2 False note_off NaN 5 62.0 82.0 15.0 NaN NaN NaN NaN NaN
267 2 True end_of_track NaN 1 NaN NaN NaN NaN NaN NaN NaN NaN

268 rows × 13 columns

Analysis

Count notes per track:

[6]:
df_midi.groupby(['i_track', 'note']).agg('size')
[6]:
i_track  note
0        36.0    32
         38.0    16
         39.0    24
         42.0    32
         43.0    46
         45.0     4
         46.0     2
1        45.0     6
         48.0     2
         50.0     8
         52.0     2
         53.0     2
         55.0     2
         57.0     2
2        57.0    18
         59.0     8
         62.0    28
         65.0    18
         67.0     8
dtype: int64

In the unnested midi frame, every note that’s played occurs in 2 rows in the data. We can pivot the data into one row per note. First we’ll extract only the note_on & note_off events of the midi data into the dataframe df_notes:

[7]:
df_meta, df_not_notes, df_notes = miditapyr.split_midi_frame(df_midi)

Now we can pivot to the wide format:

[8]:
dfw = miditapyr.pivot_notes_wide(df_notes)
dfw
[8]:
i_track note i_note channel t_note_off t_note_on velocity_note_off velocity_note_on
0 0 36.0 0 9.0 240.0 0.0 101.0 101.0
1 0 36.0 1 9.0 1200.0 960.0 101.0 101.0
2 0 36.0 2 9.0 2160.0 1920.0 101.0 101.0
3 0 36.0 3 9.0 3120.0 2880.0 101.0 101.0
4 0 36.0 4 9.0 4080.0 3840.0 101.0 101.0
... ... ... ... ... ... ... ... ...
125 2 65.0 8 15.0 11859.0 11419.0 52.0 52.0
126 2 67.0 0 15.0 13122.0 12862.0 58.0 58.0
127 2 67.0 1 15.0 14025.0 13429.0 58.0 58.0
128 2 67.0 2 15.0 14474.0 14183.0 60.0 60.0
129 2 67.0 3 15.0 15089.0 14876.0 57.0 57.0

130 rows × 8 columns

When we use this dataframe to count the notes as above, we get half the amounts, as notes aren’t counted twice anymore:

[9]:
dfw.groupby(['i_track', 'note']).agg('size')
[9]:
i_track  note
0        36.0    16
         38.0     8
         39.0    12
         42.0    16
         43.0    23
         45.0     2
         46.0     1
1        45.0     3
         48.0     1
         50.0     4
         52.0     1
         53.0     1
         55.0     1
         57.0     1
2        57.0     9
         59.0     4
         62.0    14
         65.0     9
         67.0     4
dtype: int64

The durations of the notes (measured in midi ticks) can be added to the dataframe like so:

[10]:
dfw['dur'] = dfw['t_note_off'] - dfw['t_note_on']
dfw
[10]:
i_track note i_note channel t_note_off t_note_on velocity_note_off velocity_note_on dur
0 0 36.0 0 9.0 240.0 0.0 101.0 101.0 240.0
1 0 36.0 1 9.0 1200.0 960.0 101.0 101.0 240.0
2 0 36.0 2 9.0 2160.0 1920.0 101.0 101.0 240.0
3 0 36.0 3 9.0 3120.0 2880.0 101.0 101.0 240.0
4 0 36.0 4 9.0 4080.0 3840.0 101.0 101.0 240.0
... ... ... ... ... ... ... ... ... ...
125 2 65.0 8 15.0 11859.0 11419.0 52.0 52.0 440.0
126 2 67.0 0 15.0 13122.0 12862.0 58.0 58.0 260.0
127 2 67.0 1 15.0 14025.0 13429.0 58.0 58.0 596.0
128 2 67.0 2 15.0 14474.0 14183.0 60.0 60.0 291.0
129 2 67.0 3 15.0 15089.0 14876.0 57.0 57.0 213.0

130 rows × 9 columns

Now we can summarize all notes that start and end at the same time:

[11]:
sim_notes = dfw.groupby(['i_track', 't_note_on', 't_note_off']).agg({'note': ['unique', 'size']})
sim_notes
[11]:
note
unique size
i_track t_note_on t_note_off
0 0.0 240.0 [36.0, 39.0, 43.0] 3
480.0 720.0 [38.0, 42.0] 2
720.0 960.0 [43.0] 1
960.0 1200.0 [36.0] 1
1440.0 1680.0 [42.0, 43.0] 2
... ... ... ... ...
2 14191.0 14442.0 [62.0] 1
14859.0 15080.0 [59.0] 1
14868.0 15049.0 [62.0] 1
14876.0 15089.0 [67.0] 1
15355.0 15360.0 [62.0] 1

93 rows × 2 columns

In order to only show chords that are played in each track, we’ll filter out the notes that don’t start and end together with others:

[12]:
not_single_note = sim_notes['note']['size'] > 1
sim_notes.loc[not_single_note,]
[12]:
note
unique size
i_track t_note_on t_note_off
0 0.0 240.0 [36.0, 39.0, 43.0] 3
480.0 720.0 [38.0, 42.0] 2
1440.0 1680.0 [42.0, 43.0] 2
1920.0 2160.0 [36.0, 39.0, 43.0] 3
2400.0 2640.0 [38.0, 42.0] 2
3360.0 3600.0 [42.0, 43.0] 2
3840.0 4080.0 [36.0, 39.0, 43.0] 3
4320.0 4560.0 [38.0, 42.0] 2
5280.0 5520.0 [42.0, 43.0] 2
5760.0 6000.0 [36.0, 39.0, 43.0] 3
6240.0 6480.0 [38.0, 42.0] 2
7200.0 7440.0 [42.0, 43.0] 2
7680.0 7920.0 [36.0, 39.0, 43.0] 3
8160.0 8400.0 [38.0, 42.0] 2
9120.0 9360.0 [42.0, 43.0] 2
9600.0 9840.0 [36.0, 39.0, 43.0] 3
10080.0 10320.0 [38.0, 42.0] 2
11040.0 11280.0 [42.0, 43.0] 2
11520.0 11760.0 [36.0, 43.0] 2
12000.0 12240.0 [38.0, 39.0, 42.0] 3
12960.0 13200.0 [42.0, 43.0] 2
13440.0 13680.0 [36.0, 43.0] 2
13920.0 14160.0 [38.0, 39.0, 45.0, 46.0] 4
14400.0 14640.0 [36.0, 39.0] 2
14880.0 15120.0 [39.0, 42.0, 43.0, 45.0] 4
15120.0 15360.0 [39.0, 42.0] 2

If you only want to look at triads, we could do:

[13]:
not_single_note = sim_notes['note']['size'] == 3

In this file multiple silumtaneous notes only occur in the drum track, but you get the idea.