Modeling human behavior for image sequence understanding and generation