Exploring Music Transcription with Multi-Modal Language Models