Analyzing a midi file¶

Load the midi data into dataframe¶

[1]:

import miditapyr

Let’s load the example midi file of the package:

[2]:

midi_file = miditapyr.get_test_midi_file(as_string=True)

Create a MidiFrames object:

[3]:

mf = miditapyr.MidiFrames(midi_file)

Let’s extract the unnested midi frame:

[4]:

df_midi = mf.midi_frame_unnested.df

[5]:

df_midi

[5]:

	i_track	meta	type	name	time	note	velocity	channel	tempo	numerator	denominator	clocks_per_click	notated_32nd_notes_per_beat
0	0	True	track_name	drum-t1-1-t1	0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	0	False	note_on	NaN	0	43.0	72.0	9.0	NaN	NaN	NaN	NaN	NaN
2	0	False	note_on	NaN	0	39.0	64.0	9.0	NaN	NaN	NaN	NaN	NaN
3	0	False	note_on	NaN	0	36.0	101.0	9.0	NaN	NaN	NaN	NaN	NaN
4	0	True	set_tempo	NaN	0	NaN	NaN	NaN	666666.0	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...
263	2	False	note_off	NaN	31	59.0	57.0	15.0	NaN	NaN	NaN	NaN	NaN
264	2	False	note_off	NaN	9	67.0	57.0	15.0	NaN	NaN	NaN	NaN	NaN
265	2	False	note_on	NaN	266	62.0	82.0	15.0	NaN	NaN	NaN	NaN	NaN
266	2	False	note_off	NaN	5	62.0	82.0	15.0	NaN	NaN	NaN	NaN	NaN
267	2	True	end_of_track	NaN	1	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

268 rows × 13 columns

Analysis¶

Count notes per track:

[6]:

df_midi.groupby(['i_track', 'note']).agg('size')

[6]:

i_track  note
0        36.0    32
         38.0    16
         39.0    24
         42.0    32
         43.0    46
         45.0     4
         46.0     2
1        45.0     6
         48.0     2
         50.0     8
         52.0     2
         53.0     2
         55.0     2
         57.0     2
2        57.0    18
         59.0     8
         62.0    28
         65.0    18
         67.0     8
dtype: int64

In the unnested midi frame, every note that’s played occurs in 2 rows in the data. We can pivot the data into one row per note. First we’ll extract only the note_on & note_off events of the midi data into the dataframe df_notes:

[7]:

df_meta, df_not_notes, df_notes = miditapyr.split_midi_frame(df_midi)

Now we can pivot to the wide format:

[8]:

dfw = miditapyr.pivot_notes_wide(df_notes)
dfw

[8]:

	i_track	note	i_note	channel	t_note_off	t_note_on	velocity_note_off	velocity_note_on
0	0	36.0	0	9.0	240.0	0.0	101.0	101.0
1	0	36.0	1	9.0	1200.0	960.0	101.0	101.0
2	0	36.0	2	9.0	2160.0	1920.0	101.0	101.0
3	0	36.0	3	9.0	3120.0	2880.0	101.0	101.0
4	0	36.0	4	9.0	4080.0	3840.0	101.0	101.0
...	...	...	...	...	...	...	...	...
125	2	65.0	8	15.0	11859.0	11419.0	52.0	52.0
126	2	67.0	0	15.0	13122.0	12862.0	58.0	58.0
127	2	67.0	1	15.0	14025.0	13429.0	58.0	58.0
128	2	67.0	2	15.0	14474.0	14183.0	60.0	60.0
129	2	67.0	3	15.0	15089.0	14876.0	57.0	57.0

130 rows × 8 columns

When we use this dataframe to count the notes as above, we get half the amounts, as notes aren’t counted twice anymore:

[9]:

dfw.groupby(['i_track', 'note']).agg('size')

[9]:

i_track  note
0        36.0    16
         38.0     8
         39.0    12
         42.0    16
         43.0    23
         45.0     2
         46.0     1
1        45.0     3
         48.0     1
         50.0     4
         52.0     1
         53.0     1
         55.0     1
         57.0     1
2        57.0     9
         59.0     4
         62.0    14
         65.0     9
         67.0     4
dtype: int64

The durations of the notes (measured in midi ticks) can be added to the dataframe like so:

[10]:

dfw['dur'] = dfw['t_note_off'] - dfw['t_note_on']
dfw

[10]:

	i_track	note	i_note	channel	t_note_off	t_note_on	velocity_note_off	velocity_note_on	dur
0	0	36.0	0	9.0	240.0	0.0	101.0	101.0	240.0
1	0	36.0	1	9.0	1200.0	960.0	101.0	101.0	240.0
2	0	36.0	2	9.0	2160.0	1920.0	101.0	101.0	240.0
3	0	36.0	3	9.0	3120.0	2880.0	101.0	101.0	240.0
4	0	36.0	4	9.0	4080.0	3840.0	101.0	101.0	240.0
...	...	...	...	...	...	...	...	...	...
125	2	65.0	8	15.0	11859.0	11419.0	52.0	52.0	440.0
126	2	67.0	0	15.0	13122.0	12862.0	58.0	58.0	260.0
127	2	67.0	1	15.0	14025.0	13429.0	58.0	58.0	596.0
128	2	67.0	2	15.0	14474.0	14183.0	60.0	60.0	291.0
129	2	67.0	3	15.0	15089.0	14876.0	57.0	57.0	213.0

130 rows × 9 columns

Now we can summarize all notes that start and end at the same time:

[11]:

sim_notes = dfw.groupby(['i_track', 't_note_on', 't_note_off']).agg({'note': ['unique', 'size']})
sim_notes

[11]:

			note
			unique	size
i_track	t_note_on	t_note_off
0	0.0	240.0	[36.0, 39.0, 43.0]	3
	480.0	720.0	[38.0, 42.0]	2
	720.0	960.0	[43.0]	1
	960.0	1200.0	[36.0]	1
	1440.0	1680.0	[42.0, 43.0]	2
...	...	...	...	...
2	14191.0	14442.0	[62.0]	1
	14859.0	15080.0	[59.0]	1
	14868.0	15049.0	[62.0]	1
	14876.0	15089.0	[67.0]	1
	15355.0	15360.0	[62.0]	1

93 rows × 2 columns

In order to only show chords that are played in each track, we’ll filter out the notes that don’t start and end together with others:

[12]:

not_single_note = sim_notes['note']['size'] > 1
sim_notes.loc[not_single_note,]

[12]:

			note
			unique	size
i_track	t_note_on	t_note_off
0	0.0	240.0	[36.0, 39.0, 43.0]	3
	480.0	720.0	[38.0, 42.0]	2
	1440.0	1680.0	[42.0, 43.0]	2
	1920.0	2160.0	[36.0, 39.0, 43.0]	3
	2400.0	2640.0	[38.0, 42.0]	2
	3360.0	3600.0	[42.0, 43.0]	2
	3840.0	4080.0	[36.0, 39.0, 43.0]	3
	4320.0	4560.0	[38.0, 42.0]	2
	5280.0	5520.0	[42.0, 43.0]	2
	5760.0	6000.0	[36.0, 39.0, 43.0]	3
	6240.0	6480.0	[38.0, 42.0]	2
	7200.0	7440.0	[42.0, 43.0]	2
	7680.0	7920.0	[36.0, 39.0, 43.0]	3
	8160.0	8400.0	[38.0, 42.0]	2
	9120.0	9360.0	[42.0, 43.0]	2
	9600.0	9840.0	[36.0, 39.0, 43.0]	3
	10080.0	10320.0	[38.0, 42.0]	2
	11040.0	11280.0	[42.0, 43.0]	2
	11520.0	11760.0	[36.0, 43.0]	2
	12000.0	12240.0	[38.0, 39.0, 42.0]	3
	12960.0	13200.0	[42.0, 43.0]	2
	13440.0	13680.0	[36.0, 43.0]	2
	13920.0	14160.0	[38.0, 39.0, 45.0, 46.0]	4
	14400.0	14640.0	[36.0, 39.0]	2
	14880.0	15120.0	[39.0, 42.0, 43.0, 45.0]	4
	15120.0	15360.0	[39.0, 42.0]	2

If you only want to look at triads, we could do:

[13]:

not_single_note = sim_notes['note']['size'] == 3

In this file multiple silumtaneous notes only occur in the drum track, but you get the idea.