Towards Convergent Offline Reinforcement Learning