paper reading notes

One source of LLM hallucination is exposure bias

With the release of closed-source ChatGPT, GPT-4, and open-source LLaMa models, the LLM development has seen tremendous improvements in recent months. While we are hyped with the fact that these LLMs are capable of many tasks, we have also noticed again and again that these LLMs hallucinate content.

Shaojie Jiang

Aug 9, 2023 3 min read paper reading notes, Deep Learning, NLP, LLM, hallucination, information retrieval

One source of LLM hallucination is exposure bias

Transformer Align Model

Jointly Learning to Align and Translate with Transformer Models

Shaojie Jiang

May 16, 2020 2 min read paper reading notes, Deep Learning, NLP

Compressive Transformers

Built on top of Transformer-XL, Compressive Transformer1 condenses old memories (hidden states) and stores them in the compressed memory buffer, before completely discarding them. This model is suitable for long-range sequence learning but may cause too much computational burden for tasks that only have short sequences.

Shaojie Jiang

May 12, 2020 3 min read paper reading notes, Deep Learning, NLP

Compressive Transformers

Visualizing the Loss Landscape of Neural Nets

What characterizes a easier to train, easier to generalize neural model?

Shaojie Jiang

May 6, 2020 3 min read paper reading notes, Deep Learning

Visualizing the Loss Landscape of Neural Nets

Adaptive Computation Time

My notes for the paper: Adaptive Computation Time for Recurrent Neural Networks1. Additive vs multiplicative halting probability Multiplicative: In the paper (footnote 1), the authors discuss throughly their considerations for deciding the computation time.

Shaojie Jiang

Apr 28, 2020 2 min read paper reading notes, Deep Learning