[yadifa-users] yadifa 2.2.0 aggressive slave behaviour

yadifa info at yadifa.eu
Tue Jul 19 18:19:23 CEST 2016


Dear Anand,

> Could you please consider adding code to yadifa to do ... back-off

In release 2.2.1, we have added some back-off mechanism with configurable parameters:

axfr-retry-failure-delay-multiplier which will increase the back-off time in a linear fashion
axfr-retry-failure-delay-max which can be used to set the maximum time between failed retries.

More information about these parameters will be documented in the 2.2.1 reference manual (coming soon).

Kind regards,

R&D Team
EURid VZW


-----Original Message-----
From: yadifa-users [mailto:yadifa-users-bounces at mailinglists.yadifa.eu] On Behalf Of Anand Buddhdev
Sent: Friday, July 15, 2016 4:39 PM
To: yadifa-users at mailinglists.yadifa.eu
Cc: anandb at ripe.net
Subject: [yadifa-users] yadifa 2.2.0 aggressive slave behaviour

Dear yadifa developers,

I've just started a yadifa 2.2.0 instance, with a slave zone defined in its config. Yadifa XFRs the zone from the master, but refuses to load it because of an error in the zone. Then yadifa schedules it for immediate refresh, and goes into a mad loop, XFRing the zone over and over, without ever backing off. On a slave server with many zones that could have errors, this causes a thundering herd approach towards the master(s). Here's a log snippet showing this happening for one zone (notice how yadifa has downloaded the same zone over and over within the same second). And this keeps happening rapidly ad infinitum:

2016-07-15 14:16:51.821404 | server   | N | zone load: slave zone
21.141.in-addr.arpa. requires download from the master
2016-07-15 14:16:51.880454 | server   | I | slave: 21.141.in-addr.arpa.
AXFR query to the master
2016-07-15 14:16:51.895625 | server   | I | slave: loaded AXFR for
domain 21.141.in-addr.arpa. from master at 193.0.19.190#53, serial is
2005131124
2016-07-15 14:16:51.895937 | server   | I | database:
21.141.in-addr.arpa.: zone successfully downloaded (AXFR)
2016-07-15 14:16:51.895961 | server   | I | zone load: 21.141.in-addr.arpa.
2016-07-15 14:16:51.896413 | server   | I | zone load:
21.141.in-addr.arpa.: loading AXFR file in '/var/yadifa/xfr/'
2016-07-15 14:16:51.903790 | server   | E | zone load:
21.141.in-addr.arpa.: an error occurred while loading the zone or
journal: A name in the zone does not ma
tch the origin.
2016-07-15 14:16:51.903799 | server   | N | zone load: slave zone
21.141.in-addr.arpa. requires download from the master
2016-07-15 14:16:51.907521 | server   | I | slave: 21.141.in-addr.arpa.
AXFR query to the master
2016-07-15 14:16:51.923815 | server   | I | slave: loaded AXFR for
domain 21.141.in-addr.arpa. from master at 193.0.19.190#53, serial is
2005131124
2016-07-15 14:16:51.923869 | server   | I | database:
21.141.in-addr.arpa.: zone successfully downloaded (AXFR)
2016-07-15 14:16:51.923880 | server   | I | zone load: 21.141.in-addr.arpa.
2016-07-15 14:16:51.927527 | server   | I | zone load:
21.141.in-addr.arpa.: loading AXFR file in '/var/yadifa/xfr/'
2016-07-15 14:16:51.935881 | server   | E | zone load:
21.141.in-addr.arpa.: an error occurred while loading the zone or
journal: A name in the zone does not match the origin.
2016-07-15 14:16:51.935889 | server   | N | zone load: slave zone
21.141.in-addr.arpa. requires download from the master
2016-07-15 14:16:51.939525 | server   | I | slave: 21.141.in-addr.arpa.
AXFR query to the master
2016-07-15 14:16:51.956552 | server   | I | slave: loaded AXFR for
domain 21.141.in-addr.arpa. from master at 193.0.19.190#53, serial is
2005131124
2016-07-15 14:16:51.956813 | server   | I | database:
21.141.in-addr.arpa.: zone successfully downloaded (AXFR)
2016-07-15 14:16:51.956832 | server   | I | zone load: 21.141.in-addr.arpa.
2016-07-15 14:16:51.959798 | server   | I | zone load:
21.141.in-addr.arpa.: loading AXFR file in '/var/yadifa/xfr/'
2016-07-15 14:16:51.969322 | server   | E | zone load:
21.141.in-addr.arpa.: an error occurred while loading the zone or
journal: A name in the zone does not match the origin.


All other name servers I know of will do an exponential back-off in the face of errors (with a cap on the maximum interval between retries).
They will even record these zone timers to disk so that when the server is restarted, they can remember these timers and not go into crazy refresh loops.

Could you please consider adding code to yadifa to do exponential back-off, record the timers, and persist them to disk? And allow the operator to view these timers with some kind of status command (similar to Knot's "knotc zonestatus" or NSD's "nsd-control zonestatus") ?

Regards,
Anand
_______________________________________________
yadifa-users mailing list
yadifa-users at mailinglists.yadifa.eu
http://www.yadifa.eu/mailman/listinfo/yadifa-users


More information about the yadifa-users mailing list