Python Pandas speed
phansi.work
phansi.work at gmail.com
Thu Jul 9 02:09:50 PDT 2020
Hello,
I wonder whether I am missing some optimisation libs?
When I calculate covariance matrix using pandas, the performance is not great.
Results: Size of series is always 250, number varies
<----DFLY-------> <----LINUX---->
num init cov init cov
1000 0.609434 0.335977 0.392807 0.012132
3000 2.248877 3.412862 1.375551 0.062324
5000 4.797861 9.287197 2.690005 0.161746
7000 8.190682 18.66528 4.382373 0.29853
10000 14.64084 38.76979 7.367079 0.604834
Hope the formatting lasts.
1. The first number is the create data, dragonfly is slower, but this is not something I am worried about,
2. The second is the covariance, this does not look good.
I use a virtual environment, pandas 1.0.5 and numpy 1.19.0 in both cases.
The linux mc has 16GB RAM and dragonfly has 8 GB but top did not show any swap space being used.
Both CPUs are i5, and both around 4 to 5 years old.
Code below:
import numpy as np
import pandas as pd
import datetime
import pickle
def timeme(nvals, nseries):
t1 = datetime.datetime.now()
# initialise data
df = pd.DataFrame()
for i in range(nseries):
df[str(i)] = np.random.random_sample(size=nvals)
t2 = datetime.datetime.now()
# calculate covariance
s = df.cov()
t3 = datetime.datetime.now()
return (t2 - t1).total_seconds(), (t3 - t2).total_seconds()
def main():
nvals = 250
x = {}
for nseries in [ 1000, 3000, 5000, 7000, 10000 ]:
init_time, calc_time = timeme(nvals, nseries)
x[(nvals, nseries)] = (init_time, calc_time)
return x
x = main()
with open("data.pickle", "wb") as fpw:
pickle.dump(x, fpw)
cheers
phansi
<phansi.work at gmail.com>
More information about the Users
mailing list