> > Dear all,
> >
> > we are experiencing a connection problem to the MySQL server that we use to log informations. Is there an
> > option to retry multiple times the I/O on the MySQL?
> >
> > The error we are experiencing is the following (hiding the IP address):
> >
> > [Logger,ERROR] [mlogger.cxx:2455:write_runlog_sql,ERROR] Failed to connect to database: Error: Can't
> > connect to MySQL server on 'xxx.xxx.xxx.xxx:6033' (110)
> >
> > Then the logger stops, and must be restarted. This eventually happens only during the BOR or the EOR.
>
> What would you propose? If the connection does not work, most likely the server is down or busy. If we retry,
> the connection still might not work. If we retry many times, people will complain that the run start or stop
> takes very long. If we then just continue (without stopping the logger), the MySQL database will miss important
> information and the runs probably cannot be analyzed later. So I believe it's better to really stop the logger
> so that people get aware that there is a problem and fix the source, rather than curing the symptoms.
>
> In the MEG experiment at PSI we run the logger with a MySQL database and we never see any connection issue,
> except when the MySQL server gets in maintenance (once a year), but usually we don't take data then. Since we
> use the same logger code, it cannot be a problem there. So I would try to fix the problem on the MySQL side.
>
> Best,
> Stefan
Dear Stefan,
a possible solution could be to define the number of times to retry as a parameter that is 0 by default, as well as a wait time between two subsequent tries. This
would leave the decision on how to handle a possible failed connection to the user. In our case, for example, we would prefer to not stop the acquisition in case
of a failed connection to the external SQL. In addition, we have other software that, with a retry procedure, doesn’t fail: with 1 re-try and a sleep time of 0.5 s
we already recover 100% of the faults.
Anyway, we implemented a local database, which is a mirror of the external one, and the problems disappeared.
Thanks,
Stefano. |