孙浩然

Haoran Sun


Center for Speech and Language Technologies, and
Department of Computer Science and Technology,
Tsinghua University, Beijing, China

I am studying for my master's degree in CSLT, and Department of Computer Science and Technology, Tsinghua University,
and working on Deep learning in Speech factorization. I am supervised by Prof. Thomas Fang Zheng and Prof. Dong Wang.

General Info

  • Date of BirthJul 13, 1996
  • AddressFit Building 1-303, Tsinghua University, Beijing, China
  • E-mailshr19@mails.tsinghua.edu.cn
  • Tel. +86 17 8888 42679

Research Interest

    ●     Speech Factorization

    ●     Deep Generative models

Skills

  • Pytorch
  • Tensorflow
  • Kaldi

  • Python
  • C/C++
  • C#

Curriculum Vitae





Education

  • 2019-Present

    Department of Computer Science and Technology

    Tsinghua University

    I am studying for my master's degree supervised by Prof. Thomas Fang Zheng and Prof. Dong Wang in Center for Speech and Language Technologies(CSLT) now.

  • 2014-2018

    Department of Automation

    Tsinghua University

    I did some projects on hardwares or softwares in the undergraduate years, and had some experience and abilities on researches.

Researches

  • 2020

    Deep Generative Factorization For Speech Signal

    PDF WEB

    We presented a speech factorization approach based on a novel factorial discriminative normalization flow model (factorial DNF). Experiments conducted on a two-factor case that involved phonetic contents and speaker traits demonstrated that the proposed factorial DNF had powerful capability to factorize speech signals and outperformed several comparative models in terms of information representation and manipulation.

  • 2019

    On Investigation of Unsupervised Speech Factorization Based on Normalization Flow

    PDF WEB

    we presented a preliminary investigation on unsupervised speech factorization based on the normalization flow model. This model constructed a complex invertible transform, by which we could project speech segments into a latent code space where the distribution was a simple diagonal Gaussian. Our preliminary investigation on the TIMIT database showed that this code space exhibited favorable properties such as denseness and pseudo linearity, and perceptually important factors such as phonetic contents and speaker traits could be represented as particular directions within the code space.

My Friends