Analyzing a midi file¶
Load the midi data into dataframe¶
[1]:
import miditapyr
Let’s load the example midi file of the package:
[2]:
midi_file = miditapyr.get_test_midi_file(as_string=True)
Create a MidiFrames
object:
[3]:
mf = miditapyr.MidiFrames(midi_file)
Let’s extract the unnested midi frame:
[4]:
df_midi = mf.midi_frame_unnested.df
[5]:
df_midi
[5]:
i_track | meta | type | name | time | note | velocity | channel | tempo | numerator | denominator | clocks_per_click | notated_32nd_notes_per_beat | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | True | track_name | drum-t1-1-t1 | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 0 | False | note_on | NaN | 0 | 43.0 | 72.0 | 9.0 | NaN | NaN | NaN | NaN | NaN |
2 | 0 | False | note_on | NaN | 0 | 39.0 | 64.0 | 9.0 | NaN | NaN | NaN | NaN | NaN |
3 | 0 | False | note_on | NaN | 0 | 36.0 | 101.0 | 9.0 | NaN | NaN | NaN | NaN | NaN |
4 | 0 | True | set_tempo | NaN | 0 | NaN | NaN | NaN | 666666.0 | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
263 | 2 | False | note_off | NaN | 31 | 59.0 | 57.0 | 15.0 | NaN | NaN | NaN | NaN | NaN |
264 | 2 | False | note_off | NaN | 9 | 67.0 | 57.0 | 15.0 | NaN | NaN | NaN | NaN | NaN |
265 | 2 | False | note_on | NaN | 266 | 62.0 | 82.0 | 15.0 | NaN | NaN | NaN | NaN | NaN |
266 | 2 | False | note_off | NaN | 5 | 62.0 | 82.0 | 15.0 | NaN | NaN | NaN | NaN | NaN |
267 | 2 | True | end_of_track | NaN | 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
268 rows × 13 columns
Analysis¶
Count notes per track:
[6]:
df_midi.groupby(['i_track', 'note']).agg('size')
[6]:
i_track note
0 36.0 32
38.0 16
39.0 24
42.0 32
43.0 46
45.0 4
46.0 2
1 45.0 6
48.0 2
50.0 8
52.0 2
53.0 2
55.0 2
57.0 2
2 57.0 18
59.0 8
62.0 28
65.0 18
67.0 8
dtype: int64
In the unnested midi frame, every note that’s played occurs in 2 rows in the data. We can pivot the data into one row per note. First we’ll extract only the note_on
& note_off
events of the midi data into the dataframe df_notes
:
[7]:
df_meta, df_not_notes, df_notes = miditapyr.split_midi_frame(df_midi)
Now we can pivot to the wide format:
[8]:
dfw = miditapyr.pivot_notes_wide(df_notes)
dfw
[8]:
i_track | note | i_note | channel | t_note_off | t_note_on | velocity_note_off | velocity_note_on | |
---|---|---|---|---|---|---|---|---|
0 | 0 | 36.0 | 0 | 9.0 | 240.0 | 0.0 | 101.0 | 101.0 |
1 | 0 | 36.0 | 1 | 9.0 | 1200.0 | 960.0 | 101.0 | 101.0 |
2 | 0 | 36.0 | 2 | 9.0 | 2160.0 | 1920.0 | 101.0 | 101.0 |
3 | 0 | 36.0 | 3 | 9.0 | 3120.0 | 2880.0 | 101.0 | 101.0 |
4 | 0 | 36.0 | 4 | 9.0 | 4080.0 | 3840.0 | 101.0 | 101.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
125 | 2 | 65.0 | 8 | 15.0 | 11859.0 | 11419.0 | 52.0 | 52.0 |
126 | 2 | 67.0 | 0 | 15.0 | 13122.0 | 12862.0 | 58.0 | 58.0 |
127 | 2 | 67.0 | 1 | 15.0 | 14025.0 | 13429.0 | 58.0 | 58.0 |
128 | 2 | 67.0 | 2 | 15.0 | 14474.0 | 14183.0 | 60.0 | 60.0 |
129 | 2 | 67.0 | 3 | 15.0 | 15089.0 | 14876.0 | 57.0 | 57.0 |
130 rows × 8 columns
When we use this dataframe to count the notes as above, we get half the amounts, as notes aren’t counted twice anymore:
[9]:
dfw.groupby(['i_track', 'note']).agg('size')
[9]:
i_track note
0 36.0 16
38.0 8
39.0 12
42.0 16
43.0 23
45.0 2
46.0 1
1 45.0 3
48.0 1
50.0 4
52.0 1
53.0 1
55.0 1
57.0 1
2 57.0 9
59.0 4
62.0 14
65.0 9
67.0 4
dtype: int64
The durations of the notes (measured in midi ticks) can be added to the dataframe like so:
[10]:
dfw['dur'] = dfw['t_note_off'] - dfw['t_note_on']
dfw
[10]:
i_track | note | i_note | channel | t_note_off | t_note_on | velocity_note_off | velocity_note_on | dur | |
---|---|---|---|---|---|---|---|---|---|
0 | 0 | 36.0 | 0 | 9.0 | 240.0 | 0.0 | 101.0 | 101.0 | 240.0 |
1 | 0 | 36.0 | 1 | 9.0 | 1200.0 | 960.0 | 101.0 | 101.0 | 240.0 |
2 | 0 | 36.0 | 2 | 9.0 | 2160.0 | 1920.0 | 101.0 | 101.0 | 240.0 |
3 | 0 | 36.0 | 3 | 9.0 | 3120.0 | 2880.0 | 101.0 | 101.0 | 240.0 |
4 | 0 | 36.0 | 4 | 9.0 | 4080.0 | 3840.0 | 101.0 | 101.0 | 240.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
125 | 2 | 65.0 | 8 | 15.0 | 11859.0 | 11419.0 | 52.0 | 52.0 | 440.0 |
126 | 2 | 67.0 | 0 | 15.0 | 13122.0 | 12862.0 | 58.0 | 58.0 | 260.0 |
127 | 2 | 67.0 | 1 | 15.0 | 14025.0 | 13429.0 | 58.0 | 58.0 | 596.0 |
128 | 2 | 67.0 | 2 | 15.0 | 14474.0 | 14183.0 | 60.0 | 60.0 | 291.0 |
129 | 2 | 67.0 | 3 | 15.0 | 15089.0 | 14876.0 | 57.0 | 57.0 | 213.0 |
130 rows × 9 columns
Now we can summarize all notes that start and end at the same time:
[11]:
sim_notes = dfw.groupby(['i_track', 't_note_on', 't_note_off']).agg({'note': ['unique', 'size']})
sim_notes
[11]:
note | ||||
---|---|---|---|---|
unique | size | |||
i_track | t_note_on | t_note_off | ||
0 | 0.0 | 240.0 | [36.0, 39.0, 43.0] | 3 |
480.0 | 720.0 | [38.0, 42.0] | 2 | |
720.0 | 960.0 | [43.0] | 1 | |
960.0 | 1200.0 | [36.0] | 1 | |
1440.0 | 1680.0 | [42.0, 43.0] | 2 | |
... | ... | ... | ... | ... |
2 | 14191.0 | 14442.0 | [62.0] | 1 |
14859.0 | 15080.0 | [59.0] | 1 | |
14868.0 | 15049.0 | [62.0] | 1 | |
14876.0 | 15089.0 | [67.0] | 1 | |
15355.0 | 15360.0 | [62.0] | 1 |
93 rows × 2 columns
In order to only show chords that are played in each track, we’ll filter out the notes that don’t start and end together with others:
[12]:
not_single_note = sim_notes['note']['size'] > 1
sim_notes.loc[not_single_note,]
[12]:
note | ||||
---|---|---|---|---|
unique | size | |||
i_track | t_note_on | t_note_off | ||
0 | 0.0 | 240.0 | [36.0, 39.0, 43.0] | 3 |
480.0 | 720.0 | [38.0, 42.0] | 2 | |
1440.0 | 1680.0 | [42.0, 43.0] | 2 | |
1920.0 | 2160.0 | [36.0, 39.0, 43.0] | 3 | |
2400.0 | 2640.0 | [38.0, 42.0] | 2 | |
3360.0 | 3600.0 | [42.0, 43.0] | 2 | |
3840.0 | 4080.0 | [36.0, 39.0, 43.0] | 3 | |
4320.0 | 4560.0 | [38.0, 42.0] | 2 | |
5280.0 | 5520.0 | [42.0, 43.0] | 2 | |
5760.0 | 6000.0 | [36.0, 39.0, 43.0] | 3 | |
6240.0 | 6480.0 | [38.0, 42.0] | 2 | |
7200.0 | 7440.0 | [42.0, 43.0] | 2 | |
7680.0 | 7920.0 | [36.0, 39.0, 43.0] | 3 | |
8160.0 | 8400.0 | [38.0, 42.0] | 2 | |
9120.0 | 9360.0 | [42.0, 43.0] | 2 | |
9600.0 | 9840.0 | [36.0, 39.0, 43.0] | 3 | |
10080.0 | 10320.0 | [38.0, 42.0] | 2 | |
11040.0 | 11280.0 | [42.0, 43.0] | 2 | |
11520.0 | 11760.0 | [36.0, 43.0] | 2 | |
12000.0 | 12240.0 | [38.0, 39.0, 42.0] | 3 | |
12960.0 | 13200.0 | [42.0, 43.0] | 2 | |
13440.0 | 13680.0 | [36.0, 43.0] | 2 | |
13920.0 | 14160.0 | [38.0, 39.0, 45.0, 46.0] | 4 | |
14400.0 | 14640.0 | [36.0, 39.0] | 2 | |
14880.0 | 15120.0 | [39.0, 42.0, 43.0, 45.0] | 4 | |
15120.0 | 15360.0 | [39.0, 42.0] | 2 |
If you only want to look at triads, we could do:
[13]:
not_single_note = sim_notes['note']['size'] == 3
In this file multiple silumtaneous notes only occur in the drum track, but you get the idea.