Speakers
Towards Steerable GenAI for Recommendation
Thorsten Joachims / Cornell University
GenAI offers new opportunities and challenges for increasing the effectiveness, transparency, and adaptability of recommender systems. Even today, recommender systems already operate with great autonomy and GenAI is further broadening their ability to act. To ensure desirable outcomes, we need to be able to steer these systems so that their actions match our expectations and are beneficial in the long run. This talk explores how we can design recommender systems to be more steerable and adaptive, with broad potential for impact on applications ranging from personal assistants and customer-support chatbots, to e-commerce and internet platforms. In particular, this talk will explore the affordances of user profiles that are expressed in natural language, machine learning algorithms for optimizing such natural-language profiles, and new multi-scale models that enable well-founded approaches for trading off short-term and long-term objectives.
Speaker Bio: Thorsten Joachims is a Professor in the Department of Computer Science and in the Department of Information Science, and he is the Vice Provost for AI Strategy at Cornell University. His research interests center on a synthesis of theory and system building in machine learning, with applications in information access, language technology, and recommendation. His past research focused on counterfactual and causal inference, learning to rank, structured output prediction, support vector machines, text classification, learning with preferences, and learning from implicit feedback. He is an ACM Fellow, AAAI Fellow, KDD Innovations Award recipient, and member of the ACM SIGIR Academy.
Towards Controllable and Explainable Visual GenAI for Creativity
'YZ' Yezhou Yang / Arizona State University
Generative AI has made remarkable strides in producing photorealistic images, videos, and multimodal content. Yet, aligning these generations with human users, while ensuring spatial coherence, logical consistency, and deployment scalability, remains a major challenge, especially for real-world media platforms. In this talk, YZ will present some recent research progress in enhancing the reasoning and control capabilities of image/video generative models, structured around four key pillars: 1) Efficiency & Scalability — with systems like ECLIPSE and FlowChef; 2) Control & Editing — including Lambda-ECLIPSE and RefEdit; 3) Reliability & Security — through efforts such as SPRIGHT, REVISION, R.A.C.E., and WOUAF; 4) Evaluation & Metrics — via benchmarks/metrics like VISOR, ConceptBed, TextInVision, and VOILA. Together, these contributions outline a cohesive vision for building controllable, robust, and scalable generative systems, core to advancing personalization, understanding, and automated content workflows in media streaming and beyond.
Speaker Bio: Yezhou (YZ) Yang is an Associate Professor and a Fulton Entrepreneurial Professor in the School of Computing and Augmented Intelligence (SCAI) at Arizona State University, and an Amazon Scholar with Amazon Prime Video. He founded and directs the Active Perception research Group at ASU. His work includes exploring primitives and representation learning in visual understanding, secure/robust AI/GenAI, and V&L model evaluation/alignment. Yang is a recipient of the Qualcomm Innovation Fellowship 2011, the NSF CAREER Award 2018, the Amazon AWS Machine Learning Research Award 2019, and the best paper awards at the 2024 CVPR Vision Datasets Understanding and the 2025 CVPR Benchmarking and Expanding AI Multimodal Approaches workshops. He received his Ph.D. from the University of Maryland at College Park and a B.E. from Zhejiang University, China.
Building Large-Scale Stream Prediction Models at Prime Video: Lessons from a Complex, High-Throughput Recommender System
Qilin Qi / Prime Video
Accurate stream prediction sits at the core of content ranking and personalization at Prime Video, yet building such models at scale presents unique challenges. Prime Video is a highly heterogeneous product, spanning SVOD and TVOD content, movies and episodic series, linear TV, and kids-specific profiles, each with distinct user intents and viewing dynamics. These complexities are further amplified by diverse viewing patterns such as re-watching, continuous episodic consumption, and new title discovery. In this talk, I will share how we are revamping Prime Video's stream prediction models and machine learning platform to address these challenges. We will cover how we model heterogeneous content and user behaviors in a unified prediction framework, how we reason about short-term versus long-term engagement signals, and how these predictions directly inform large-scale ranking decisions. I will also discuss the systems side of the problem: training and serving high-capacity models under strict latency and throughput constraints, and the platform investments required to support reliable experimentation and rapid iteration at Prime Video scale.
Speaker Bio: Qilin Qi is a senior applied science manager at Amazon, where he leads the Prime Video recommendation science team. He drives the design and delivery of large-scale recommendation, ranking, and whole-page optimization systems that power personalized discovery experiences across Prime Video surfaces. Previously, Qilin led DoorDash's core consumer machine learning teams, building homepage recommendations, search ranking, and growth ML systems. He has held leadership and research roles across Amazon, Microsoft, Meta, and Apple, and brings deep expertise in recommendation systems, information retrieval, and applied machine learning at scale.
On Collaborative Filtering for Conversational Recommender Systems
Harald Steck / Netflix
Large Language Models (LLMs) used as conversational recommender systems allow users to express their current intent in a nuanced way, leading to recommendations that better match the moment. Beyond item metadata, these systems can also benefit from collaborative filtering signals, which are the focus of this talk. I will present joint work with our academic collaborators on a simple yet effective neighborhood-based method, and on a modification of GRPO for generating ranked lists of items.
Speaker Bio: Harald Steck is a research scientist at Netflix, working on recommender systems, search algorithms and related topics. Prior to that he conducted ML research at Bell Labs, Siemens, ETH Zurich and MIT, after obtaining his PhD from the Technical University of Munich, Germany.
From Multi-Task Recommender Systems to Catalog-Grounded Personalization at Scale
Shivam Verma / Spotify
Web-scale media platforms are moving beyond ranking toward systems that can retrieve, recommend, and explain in natural language – while still staying grounded in an extremely large, fast-changing catalog. In this talk, I'll elaborate on how we leverage multi-task recommender systems at Spotify, tying together advances in multi-objective modeling across personalization and ads prediction. We'll discuss the progression towards catalog-aware generative models that leverage shared representation learning, Semantic IDs, and lightweight LLM conditioning. Specifically, I'll describe how we learn generalized user representations that transfer across tasks and surfaces; how we represent catalog entities with semantic IDs as sequences of tokens for joint generative search and recommendation; and how we use parameter-efficient conditioning methods such as Embedding-to-Prefix (E2P) to inject learned user representations into frozen LLMs. We'll cover how these building blocks help domain-adapt open-weight models for better catalog understanding, and enable a more personalized streaming experience for Spotify's hundreds of millions of users.
Speaker Bio: Shivam Verma is a Staff Machine Learning Engineer at Spotify, where he leads a foundation modeling team for personalization. His work focuses on large-scale representation learning, with an emphasis on building long-horizon user embeddings that can be deployed across multiple product surfaces. His research interests span across an intersection of recommender systems and LLMs, ML systems & real-time serving, generative multi-modal modeling, and mechanistic interpretability. Prior to Spotify, he was part of Twitter's Cortex Research group, where he led multiple NLP and RecSys initiatives, including BERT-based embedding models and candidate generation systems used in production. He holds an M.S. from Courant Institute of Mathematical Sciences, New York University and a B.Tech. from Indian Institute of Technology Delhi.
Quality at Scale: Engineering Evaluation Frameworks for Multi-Agent Systems
Amanpreet Kaur / Google
As Generative AI evolves from single-turn interactions to complex multi-agent workflows, the problem of quality assessment is no longer confined to the evaluation of static outputs but has expanded to the assessment of dynamic multi-turn collaborations. In the context of streaming media, where personalization, adaptability, and quality are of utmost importance, these issues are further complicated by the nondeterministic nature of agent behaviors. This keynote presentation will venture into the uncharted territory of multi-agent system evaluation, moving beyond traditional accuracy metrics to comprehensive frameworks that measure the agent coordination, reasoning trajectories, and system reliability/robustness. We will discuss practical strategies for peering into dissecting the 'black box' of inter-agent communication & introduce methods for scalable, automated evaluation using 'LLM-as-a-judge' frameworks. Additionally, we will investigate the pivotal importance of human-aligned benchmarks in ensuring that autonomous agents not only optimize tasks but also meet the expected standards of safety and quality in the production environment. By establishing rigorous evaluation standards, we can bridge the gap between stochastic generation and reliable, production-grade agentic experiences.
Speaker Bio: Amanpreet is a Senior Engineering Analyst at Google, where she enables the integration of Generative AI into core Search products while ensuring ranking precision and reliability. With over 11 years of leadership experience in data science roles, her career is defined by high-impact technical execution and strategic oversight, currently bridging the gap between complex model evaluation and signal optimization to enhance topical relevance and information retrieval. Specializing in high-accuracy AI evaluation frameworks, Amanpreet is responsible for developing large-scale auto-raters and optimizing RLHF pipelines that serve as the ground truth for Search ranking models. Her work includes identifying systemic edge cases and outlier patterns using multi-stage agentic workflows and establishing technical guardrails across the development lifecycle. Her expertise spans quantitative metric design, statistical modeling, and the evaluation and tuning of complex algorithms. By leveraging Context Models and next-gen AI agents, she builds systems capable of high-precision semantic analysis to build scalable signals that benchmark and refine global model performance.
Quality at Scale: Engineering Evaluation Frameworks for Multi-Agent Systems
Ajay Yadav / Google
As Generative AI evolves from single-turn interactions to complex multi-agent workflows, the problem of quality assessment is no longer confined to the evaluation of static outputs but has expanded to the assessment of dynamic multi-turn collaborations. In the context of streaming media, where personalization, adaptability, and quality are of utmost importance, these issues are further complicated by the nondeterministic nature of agent behaviors. This keynote presentation will venture into the uncharted territory of multi-agent system evaluation, moving beyond traditional accuracy metrics to comprehensive frameworks that measure the agent coordination, reasoning trajectories, and system reliability/robustness. We will discuss practical strategies for peering into dissecting the 'black box' of inter-agent communication & introduce methods for scalable, automated evaluation using 'LLM-as-a-judge' frameworks. Additionally, we will investigate the pivotal importance of human-aligned benchmarks in ensuring that autonomous agents not only optimize tasks but also meet the expected standards of safety and quality in the production environment. By establishing rigorous evaluation standards, we can bridge the gap between stochastic generation and reliable, production-grade agentic experiences.
Speaker Bio: Ajay Yadav is a Staff Software Engineer at Google, where he leads the technical strategy for Workspace AI and agentic workflows. He drives the design of planet-scale infrastructure and high-precision fraud detection systems powered by Graph AI and Large Language Models (LLMs) to protect over a billion users. Previously, Ajay served as the technical lead for Google's Payments Risk platform and Next-Gen Offers infrastructure, where he pioneered self-serve marketing ecosystems and low-latency serving models. He is a Fellow of the British Computer Society (BCS) and the Institution of Engineering and Technology (IET), and he brings deep expertise in building scalable, reliable AI/ML solutions for complex, global initiatives.