Fork on Linux should use copy-on-write vmpages now, so if you fork inside python it should be cheap. If you launch a new Python process from let's say the shell, and it's already in the buffer cache, then you should only have to pay the startup CPU cost of the interpreter, since the IO should be satisfied from buffer cache...
It depends on whether one uses clone, fork, posix_spawn etc.
Fork can take a while depending on the size of the address space, number of VMAs etc.