pdf extraction archirecture