LLM in a Flash: Efficient Inference with Limited Memory