Federated learning (FL) operates based on model exchanges between the server and the clients, and suffers from significant communication as well as client-side computation burden. While emerging split learning (SL) solutions can reduce the clientside computation burden by splitting the model architecture, SL-based ideas still require significant time delay and communication burden for transmitting the forward activations and backward gradients at every global round. In this paper, we propose a new direction to FL/SL based on updating the client/server-side models in parallel, via local-loss-based training specifically geared to split learning. The parallel training of split models substantially shortens latency while obviating server-to-clients communication. We provide latency analysis that leads to optimal model cut as well as general guidelines for splitting the model. We also provide a theoretical analysis for guaranteeing convergence of our method. Extensive experimental results indicate that our scheme has significant communication and latency advantages over existing FL and SL ideas.