Networking Support

Deep Steps: A Generative AI Step Sequencer · AIMC 2024 (09/09

November 10, 2024

Integrating Generative AI into Music Creation Workflows

As an experienced IT professional well-versed in the latest advancements in technology, I’m excited to explore the creative potential of Deep Steps – a generative AI-powered step sequencer showcased at the AIMC 2024 conference. This innovative application aims to seamlessly integrate artificial intelligence into the music production process, empowering artists to collaborate with generative models and expand their creative horizons.

Navigating the Evolving Landscape of AI-Driven Music

The application of deep learning techniques has significantly advanced the field of music generation in recent years. While many state-of-the-art systems are capable of generating entire musical compositions automatically, the majority operate in an “offline” manner, leaving limited opportunities for user control and interaction.

“Exposing machine learning parameters as part of a real-time creative musical interaction is a key exploration here.”

This is where Deep Steps stands out. By allowing users to train the integrated neural network using their own audio loops, the system empowers artists to shape the generative process and incorporate their unique musical sensibilities into the final output. This approach aligns with the emerging trends in human-computer interaction (HCI) for music, where generative deep learning models are becoming increasingly prominent, offering new avenues for creative collaboration.

Designing for Accessibility and Immediacy

Guided by the principles of user-centric design, the Deep Steps team has prioritized accessibility and immediacy in their implementation. Rather than overwhelming users with complex deep learning architectures and extensive training requirements, the system features a compact neural network model that can be trained quickly using relatively small datasets.

“Previous work has focused on constructing the model architecture such that it can be used with relatively small datasets (example datasets are 168 and 39 audio loops) and generally takes only a few seconds to train, depending on the number of epochs. This affordance is important for making the implementation accessible to music producers.“

This focus on accessibility ensures that musicians can seamlessly integrate Deep Steps into their existing production workflows, experimenting with different training datasets and parameters without the need for specialized expertise or computational resources.

Exploring Control Paradigms: Immediacy vs. Autonomy

A key aspect of the Deep Steps design process was the exploration of different control paradigms for interacting with the generative AI system. The research team subjected the application to a preliminary user study, evaluating two distinct user interface (UI) conditions:

Condition 1: “Generate” Button: This UI afforded users an instant “generate” button, allowing them to quickly create new rhythmic parts for their sequence. This control paradigm prioritizes immediacy, enabling users to rapidly generate ideas and focus on other aspects of the music-making process.
Condition 2: Latent Value Sliders: The second UI condition provided users with four continuous sliders to manually feed values into the neural network’s bottleneck layer, giving them more direct control over the generative process. This approach aims to provide a more instrument-like interaction, prioritizing user autonomy.

The findings from the user study revealed that both control paradigms had their respective strengths and weaknesses. The “generate” button of Condition 1 achieved consistent usability scores, as it allowed for a straightforward and efficient interaction. However, some users found the inability to return to previously generated parts to be a limitation.

Conversely, Condition 2’s latent value sliders offered more flexibility and a sense of authorial control, but the arbitrary nature of feeding values into the latent space was divisive, with some users embracing the expanded autonomy while others found it overly complicated.

These insights have challenged the initial assumptions and will guide the future development of Deep Steps, as the team explores ways to combine the immediacy of instant generation with the autonomy and recall provided by the manual control paradigm.

Empowering the Creative Process through User-Trainable AI

A key design philosophy behind Deep Steps is the idea of exposing machine learning parameters as part of the creative musical interaction. By allowing users to train the neural network with their own audio loops, the system empowers artists to incorporate their unique musical sensibilities into the generative process.

“When a user becomes bored or unsatisfied with the generative output of the model, they trained it again to achieve different results, in effect becoming part of the creative interaction.”

This user-centric approach not only enhances the sense of ownership and authorship but also encourages experimentation and exploration. Users were observed training the model with varying numbers of epochs, purposefully “poor” training to see how it would affect the outputs, and using the training process itself as a creative tool.

Integrating Generative AI into Music Production Workflows

While Deep Steps is primarily focused on integrating generative AI into the creative process, the research team has also recognized the importance of the application being a seamless part of a larger music production workflow. The system is designed to function as a MIDI step sequencer, allowing users to control pitch, tempo, and other musical parameters, while the neural network generates the rhythmic elements.

“Beyond what has already been mentioned, users were observed adding drum accompaniment, changing the timbre of instruments, and recording MIDI parts from the sequencer they liked. It should be emphasised at this point that Deep Steps intends to be part of a larger music production workflow.”

This integration of generative AI within a familiar music production environment has been well-received by the participants in the user study, who were able to leverage the system’s capabilities while also engaging in other music-making tasks, such as adding percussion, adjusting timbres, and recording MIDI performances.

Continuous Improvement through User-Centric Design

The preliminary user study conducted on Deep Steps has provided valuable insights that will guide the system’s ongoing development. While the overall usability and creativity support scores were positive, the study has also uncovered areas for improvement, such as the need for additional controls, visual feedback, and ways to better integrate the generative AI capabilities within the broader music production workflow.

“The findings here, however, will also be used to inform the framing of this future study. This paper introduced the Deep Steps application, a generative MIDI step sequencer with a user-trainable neural network. We presented its design sensibilities, implementation as a stand-alone application, and a preliminary user study.”

By embracing a user-centric design approach, the Deep Steps team is committed to continuously refining the system, incorporating feedback from musicians, and exploring new ways to seamlessly blend generative AI into the creative process. As the field of AI-driven music continues to evolve, initiatives like Deep Steps pave the way for more accessible and engaging collaborative experiences between humans and machines.

For IT professionals and music enthusiasts alike, the Deep Steps project offers a glimpse into the future of music creation, where generative AI can be a powerful ally in unleashing our creative potential. As we continue to explore the intersections of technology, art, and human expression, tools like Deep Steps will undoubtedly play a crucial role in shaping the music-making landscape of tomorrow.

To learn more about Deep Steps and stay updated on its development, visit the AIMC 2024 website or the IT Fix blog for the latest insights and innovations in the world of AI-powered music creation.