[yadifa-users] yadifa 2.2.0 aggressive slave behaviour

Anand Buddhdev anandb at ripe.net
Fri Jul 15 16:39:23 CEST 2016


Dear yadifa developers,

I've just started a yadifa 2.2.0 instance, with a slave zone defined in
its config. Yadifa XFRs the zone from the master, but refuses to load it
because of an error in the zone. Then yadifa schedules it for immediate
refresh, and goes into a mad loop, XFRing the zone over and over,
without ever backing off. On a slave server with many zones that could
have errors, this causes a thundering herd approach towards the
master(s). Here's a log snippet showing this happening for one zone
(notice how yadifa has downloaded the same zone over and over within the
same second). And this keeps happening rapidly ad infinitum:

2016-07-15 14:16:51.821404 | server   | N | zone load: slave zone
21.141.in-addr.arpa. requires download from the master
2016-07-15 14:16:51.880454 | server   | I | slave: 21.141.in-addr.arpa.
AXFR query to the master
2016-07-15 14:16:51.895625 | server   | I | slave: loaded AXFR for
domain 21.141.in-addr.arpa. from master at 193.0.19.190#53, serial is
2005131124
2016-07-15 14:16:51.895937 | server   | I | database:
21.141.in-addr.arpa.: zone successfully downloaded (AXFR)
2016-07-15 14:16:51.895961 | server   | I | zone load: 21.141.in-addr.arpa.
2016-07-15 14:16:51.896413 | server   | I | zone load:
21.141.in-addr.arpa.: loading AXFR file in '/var/yadifa/xfr/'
2016-07-15 14:16:51.903790 | server   | E | zone load:
21.141.in-addr.arpa.: an error occurred while loading the zone or
journal: A name in the zone does not ma
tch the origin.
2016-07-15 14:16:51.903799 | server   | N | zone load: slave zone
21.141.in-addr.arpa. requires download from the master
2016-07-15 14:16:51.907521 | server   | I | slave: 21.141.in-addr.arpa.
AXFR query to the master
2016-07-15 14:16:51.923815 | server   | I | slave: loaded AXFR for
domain 21.141.in-addr.arpa. from master at 193.0.19.190#53, serial is
2005131124
2016-07-15 14:16:51.923869 | server   | I | database:
21.141.in-addr.arpa.: zone successfully downloaded (AXFR)
2016-07-15 14:16:51.923880 | server   | I | zone load: 21.141.in-addr.arpa.
2016-07-15 14:16:51.927527 | server   | I | zone load:
21.141.in-addr.arpa.: loading AXFR file in '/var/yadifa/xfr/'
2016-07-15 14:16:51.935881 | server   | E | zone load:
21.141.in-addr.arpa.: an error occurred while loading the zone or
journal: A name in the zone does not match the origin.
2016-07-15 14:16:51.935889 | server   | N | zone load: slave zone
21.141.in-addr.arpa. requires download from the master
2016-07-15 14:16:51.939525 | server   | I | slave: 21.141.in-addr.arpa.
AXFR query to the master
2016-07-15 14:16:51.956552 | server   | I | slave: loaded AXFR for
domain 21.141.in-addr.arpa. from master at 193.0.19.190#53, serial is
2005131124
2016-07-15 14:16:51.956813 | server   | I | database:
21.141.in-addr.arpa.: zone successfully downloaded (AXFR)
2016-07-15 14:16:51.956832 | server   | I | zone load: 21.141.in-addr.arpa.
2016-07-15 14:16:51.959798 | server   | I | zone load:
21.141.in-addr.arpa.: loading AXFR file in '/var/yadifa/xfr/'
2016-07-15 14:16:51.969322 | server   | E | zone load:
21.141.in-addr.arpa.: an error occurred while loading the zone or
journal: A name in the zone does not match the origin.


All other name servers I know of will do an exponential back-off in the
face of errors (with a cap on the maximum interval between retries).
They will even record these zone timers to disk so that when the server
is restarted, they can remember these timers and not go into crazy
refresh loops.

Could you please consider adding code to yadifa to do exponential
back-off, record the timers, and persist them to disk? And allow the
operator to view these timers with some kind of status command (similar
to Knot's "knotc zonestatus" or NSD's "nsd-control zonestatus") ?

Regards,
Anand


More information about the yadifa-users mailing list