GPT-4 architecture: what we can deduce from research literature | Kirill Gadjello's personal blog and website - discu.eu

Hacker News

GPT-4 architecture: what we can deduce from research literature https://kir-gadjello.github.io/posts/gpt4-some-technical-hypotheses/ 6 comments 14/3/2023

Linked pages

[2005.14165] Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165 201 comments
Deep Neural Nets: 33 years ago and 33 years from now https://karpathy.github.io/2022/03/14/lecun1989 180 comments
[1706.03762] Attention Is All You Need https://arxiv.org/abs/1706.03762 145 comments
[2302.14045] Language Is Not All You Need: Aligning Perception with Language Models https://arxiv.org/abs/2302.14045 115 comments
GitHub - google/jax: Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more https://github.com/google/jax 99 comments
DeepMind CEO Demis Hassabis Urges Caution on AI | TIME https://time.com/6246119/demis-hassabis-deepmind-interview/ 48 comments
The Transformer Family Version 2.0 | Lil'Log https://lilianweng.github.io/posts/2023-01-27-the-transformer-family-v2/ 46 comments
Large Transformer Model Inference Optimization | Lil'Log https://lilianweng.github.io/posts/2023-01-10-inference-optimization/ 20 comments
[2206.14486] Beyond neural scaling laws: beating power law scaling via data pruning https://arxiv.org/abs/2206.14486 16 comments
[2009.06489] The Hardware Lottery https://arxiv.org/abs/2009.06489 16 comments
[2208.03299] Atlas: Few-shot Learning with Retrieval Augmented Language Models https://arxiv.org/abs/2208.03299 12 comments
https://arxiv.org/abs/2111.12763 5 comments
https://arxiv.org/abs/2101.03961 4 comments
µTransfer: A technique for hyperparameter tuning of enormous neural networks - Microsoft Research https://www.microsoft.com/en-us/research/blog/%C2%B5transfer-a-technique-for-hyperparameter-tuning-of-enormous-neural-networks/ 1 comment
[2203.15556] Training Compute-Optimal Large Language Models https://arxiv.org/abs/2203.15556 0 comments
[2001.08361] Scaling Laws for Neural Language Models https://arxiv.org/abs/2001.08361 0 comments
Introducing Meta’s Next-Gen AI Supercomputer | Meta https://about.fb.com/news/2022/01/introducing-metas-next-gen-ai-supercomputer/ 0 comments
[2301.12597] BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models https://arxiv.org/abs/2301.12597 0 comments
[2202.08906] ST-MoE: Designing Stable and Transferable Sparse Expert Models https://arxiv.org/abs/2202.08906 0 comments
[2302.13971] LLaMA: Open and Efficient Foundation Language Models https://arxiv.org/abs/2302.13971 0 comments

Related searches:

Search whole site: site:kir-gadjello.github.io

Search title: GPT-4 architecture: what we can deduce from research literature | Kirill Gadjello's personal blog and website

See how to search.

Submit link to: