Principal Applied Scientist
Microsoft
Date: 9 hours ago
City: Remote
Contract type: Full time
Remote

Microsoft’s Applied Sciences Group is seeking a visionary and hands-on Principal Applied Scientist to lead research and development in multimodal AI, with a dual focus on image understanding and autoregressive generation across language and vision. This role is ideal for candidates passionate about building real-world systems that unify visual and textual modalities to power next-generation user experiences across devices and platforms.
As a senior member of the team, you will drive innovation across model architecture, training, and deployment, especially for scalable autoregressive models that handle both language and image generation in a unified framework. You will also play a key role in converting cutting-edge research into practical applications and experiences for users across the globe.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Responsibilities
Required Qualifications:
Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
As a senior member of the team, you will drive innovation across model architecture, training, and deployment, especially for scalable autoregressive models that handle both language and image generation in a unified framework. You will also play a key role in converting cutting-edge research into practical applications and experiences for users across the globe.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Responsibilities
- Design and prototype unified token-based architectures that treat text and image data as sequences for coherent multimodal generation.
- Lead development of multimodal autoregressive models for tasks such as image captioning, visual question answering, and text-to-image generation.
- Research, design, and implement state-of-the-art generative models (diffusion, auto-regressive, etc.) for high-quality image and video generation.
- Build scalable training pipelines for large-scale image-text datasets using discrete tokenisation (e.g. VQ-VAE, DALL·E-style encoders).
- Optimize deep neural networks for deployment on Neural Processing Units (NPUs) and cloud environments, maximizing efficiency and performance.
- Collaborate with cross-functional teams to integrate models into Microsoft products and services.
- Publish research in top-tier venues (CVPR, ICCV, NeurIPS, ICLR) and contribute to the scientific community.
- Mentor junior scientists and engineers, fostering a collaborative and innovative research environment.
Required Qualifications:
- Doctorate in Computer Vision, Machine Learning, or a related field with demonstratable experience in .
- OR Master's degree in Computer Vision, Machine Learning, or a related field with demonstratable experience in applied research or product development
- OR Bachelor's degree in Computer Vision, Machine Learning, or a related field with demonstratable experience in applied research or product development
- Solid experience in publication record in multimodal learning, image understanding, or generative modelling.
- Proficiency in Python and deep learning frameworks (e.g., PyTorch, TensorFlow, HuggingFace).
- Hands-on experience with generative models, especially diffusion and transformer-based synthesis.
- Experience building and training multimodal autoregressive models (e.g., for image captioning, VQA, text-to-image).
- Familiarity with discrete tokenization techniques (e.g., VQ-VAE).
- Ability to design and prototype unified token-based architectures for multimodal generation.
- Strong publication record in top-tier venues (CVPR, ICCV, NeurIPS, ICLR).
- Demonstrated ability to translate research into real-world applications.
- Experience deploying models to production or on-device environments.
- Experience optimizing models for Neural Processing Units (NPUs) or other hardware accelerators.
- Knowledge of quantization, pruning, and efficient fine-tuning techniques.
- Experience with prompt engineering, instruction tuning, and RLHF.
- Exposure to video generation and temporal modeling.
- Experience mentoring junior scientists or engineers.
- Strong collaborative skills across cross-functional teams.
Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
How to apply
To apply for this job you need to authorize on our website. If you don't have an account yet, please register.
Post a resumeSimilar jobs
Satellite Systems Engineer
DataAnnotation,
Remote
1 week ago
We are looking for a physics expert to join our team to train AI models. You will measure the progress of these AI chatbots, evaluate their logic, and solve problems to improve the quality of each model.In this role you will need to hold an expert understanding of physics- a completed or in progress Masters/PhD is preferred but not required....

Social Media Content Manager
We Make Change,
Remote
1 week ago
Support a startup making solar energy truly sustainable The Solar ManagerSolar Manager is building the platform to operate and scale the world’s solar infrastructure.Solar is breaking global deployment records every year — yet asset performance and reporting remain stuck in the past. $10B+ in value is lost annually due to underperformance, rising interest rates, and outdated tools. At the same...

Business Development Associate
Oxentia,
Remote
1 week ago
Business Development Associate Oxentia Ltd is looking for a Business Development Associate with an extensive network within high growth innovation ecosystems. This is a commission-based role, with remuneration recognised on a success fee basis, and we are open to defining the Associate’s region of operation depending on experience and demonstrable networks. Cover letters submitted as part of this application should include...
