Robustly optimized BERT approach with 355M parameters for masked language modeling and downstream NLU tasks.