Participants
- Kilian
- Michal
- Andy
- Andreas Petzold
- Dirk Sammel
- Jan Erik Sundermann
- Manuel Giffles
- Marek Szuba
- Max Fischer
- Michael Boehler
- Paul Kramp
- Raffaele
- Rene Caspart
- Serhat Atay
- Soeren Fleischer

Presentation Andy

==> Participants introduction:
Paul: with Kilian, implemented Plug-ins with help of Mikhal
Soeren: GSI, administer XRootD servers
Raffaele: GSI, ALICE Tier2/AF, looks at logs
Kilian: distributed computing ALICE T2 AF analysis facility
Andreas Petzold, manager of GridKa, knows Andy since 2003
Dirk: Freiburg, implementationa and bench marking of DCOTF
Marek: NA61, now ESCAPE, SE prototype data lake
Max: KIT adminstrator for ALICE at T1, R&D XrootD for Caching
Rene: KIT CMS infrastructure at GridKA, R&D for caches
Serhat: Frankfurt, with GSI group XRootD for data caching DCOTF
Mikhal: CERN, IT XRootD and EOSC, take care of client, release manager
Andy: SLAC, server side of XRootD, and get things moving

==> Agenda ok ?
wishes for changes ?

No wishes for changes.

Presentation of Andy

questions during the presentation

Next, Presentation of Mikhal
XRootD from client perspective

Discussion

Next
Bug with LocalRoot and LocalRedirect
(Soeren)
solution:
localredirect plugin needs to be adjusted to consider the issue properly.
Moving necessary localroot info from oss.localroot to plug-in specific config should solve issue

AI Paul

Next:
Xcache thread utilisation
Serhat
every request is its own thread which is active until satisfied
1:1 ratio threads to active clients
writeback field also can control thread numbers
blocks have to be written to disk
https://xrootd.slac.stanford.edu/doc/dev50/pss_config.htm#_Toc42114747
default to 4 because 1 to 2 disks can satisfy
larger installation might want more threads
clients: lot of request can lead to be cpu bound
xrd v4 and v5 different behaviour
nproc limit of server might be too low.
10000 is a reasonable limit
hard limit should not be too low.
Caching server uses 3 event loops
threads are reused
it is reading from the cache
Andy is mystified
discretely trace certain requests in server
to see what server is doing in practice
then do log and see what overlap is
when coming from cache then there is no write pool issues
ALICE jobs: unclear how much use they make of vector reads
with ATLAS we had this problem.

AI Serhat

 

AOB ?
or finished for today
Remaining issues moved to tomorrow
 

Tuesday:
1. Serhat URL issue
check on the way which expects hostname and no IP address
XrdCl Proxy Prefix Plug-In could be tried out
if it works with the plug-in it is on ROOT side
otherwise it is a client issue

Client prefix not needed anymore
Same functionality in the client included

Andy sees the problem
in the config file you have to export root and xroot both if URLs shall be mixed

in the ticket is the complete URL reported ?
This should be e-mailed to Andy including URL

AI Serhat

 

2. Monitoring Max Fischer
Opportunistic resources used
cluster and storage working fine
sometimes single jobs fail due to network
how to figure out where the issue is?
of many 1000 jobs a few go wrong
Increasingly complicated
we need to figure out what the jobs were initially doing
==> client side thing
if client can not connect the server does not know
monitoring plugins exist
unclear if there is one for connection failures
we need to monitor and to report in client that attempted connection failed and why
on connect call back exists
functionality can be added

AI xrd team


3. Jan Erik
hpss tape system backend
various scripts
not tested
xrd team will help debugging
scripts can be in any language
one script hpss.cp should be in repo and helps writing these scripts
copies files in and out of hpss
another script
rename, remove, ...
shell interface was not available at that time
pro SRM: dCAche and STORM
remaining world: anti SRM
CTA will not do SRM
http interface was discussed on HSF
SRM replacement discussed
implementation in dCache
STORM looks into that
extra site services like FTS may cause trouble
if not then you can use anything
user has to say: bring file online and then to 3rd party copy
may be more fractured than thought
==> internal problem to be solved first
if a person hits a file which is only on tape the jobs will
wait until the file is brought online
if >> few minutes client will time out
timeout could be overwritten
you may waste batch cpu time but it is transparent
needs to be fixed
SRM replacement: no solution because unclear situation
ALICE refused to do prestage stuff
maybe different approach to ALICE
XRootD has to be explicitely told what to prestage

4. monitoring data lake (Paul)
huge amount of monitoring
collector needed to make sense of data
via json too much monitoring data would be produced
therefore data are compressed
offline effort to recreate data
but this allowed detail monitoring
binary stream can be converted into json for example
collector should be local at the site aggregates data
udp packages stay in site and can be exported lateron
in collector binary to json translator
generators exist
current state as described
translating collector will be put in xrootd stuff
other potential users exist
simpler collector exist in python but state unclear
to some extent not documented
there are efforts related to EOSC
DPM has a lot of that monitoring available

Andreas:
monitor file access in Xrd
this is in upstream, see monitoring manual
collector at UCSC in principle exist
this collector creates root files

Max started working on that, https://github.com/maxfischer2781/xrootdlib
Maybe KIT and GSI can join forces there

Andy will provide a contact person about the Python converter

AI xrd team

 

5 minutes break

==> Dedicated session with GridKa and ALICE about HPSS staging and peak bandwidth usage, to be organised by Kilian

AI Kilian

 

5. Open ID connect
plugins exist 3rd party
site tokens plugin
may be parts of xroot core
token generator, then scitokens infrastrcuture can be used
focus is for using this for 3rd party transfers
not so much for regular jobs
juptyter nb use, no schema for jbn user to get token
open id connect convertor to x509 certs works fine today
not yet in full fledged development state

 

6. multi user plugin from OSG
OSG folks can be contacted
Andy will point to where it can be downloaded from
so you can trust uid/gid which authenticated unix system provided
as long as access to server is restricted to clients which have proper authentication
it works fine
it is a secure as nfs, which is a drawback

combination of open id connect is planned
token as authentication or per file ?
site token are authorisation tokens right now.
we say it is not secure because we still allow clients to enter a server without a token
this is not acceptable
If you have a schema and you have a plugin then you can write authenticator on server side
infrastructure is there to do that but the code is not

fractured landscape in terms of token
currently KIT is using something based on apache
but xrootd is a very good candidate for an http server

AI xrd team to give a pointer for downloading multi user plugin (OSG)


7. Global namespace
Dynafed does something similar
can combine http services to a global namespace
how to do that via xrootd ?
redirectors have that system already
cms AAA has global namespace across 2000 servers
that will work here, too
if http or root does not matter
but on WAN it requires care
CMS has regional approach, if file not found in region, check next region
CMS seperated unreliable sites
other solution: ATLAS
in Rucio system
xrootd plugins exist but not trivial for http
xrootd scalability is not an issue
cms has one flat name space
then they went to regional approach
http ls should work on a subtree if not contact Andy

AI GSI group: check http ls on subtree


8. QoS with XRootD (Marek)
Marek showed CBM/PANDA QoS documents
bandwidth control via throddle plugin
prevents overload
cgroups can categorise data
after that it does not care
then you have to use a migrator to move data to tape
if you want to go beyond we have to talk
person X can only use that type of cgroups ... this does not exist
cgroup is available to all
is xrootd aware of cgroups or site has to set it up ? the latter
cgroups is associated to media
setup can be done per block storage device or per directory
can dCache authorise cgroups for a certain people ?
this is not straightforward thing to do
in a presentation way back Andy showed how cgroups can be used
because we wanted to track usage by cgroups
so that cgroups considers only space in that specific volume

AI: Andy ==> point to presentation about cgroups

 

9. https via xrootd performance
4.12 and 5 release
half throughput with https compared to http
Andy can not think of anything specific since early 5.0 which has impact on performance
nothing with openssl
not seen performance figures, interested to see that ?
should not get 50% of the performance
Andy is interested to see where the problem is
test should be done without virtual networking
 but if you see 50% difference in the same virtual machine and it is a relative issue then something is wrong
 Paul wanted to rerun the tests
 will come back to Andy
 Discussion can be started in github publicly also

AI: Paul

 

10. runaway resource allocation (Soeren)
is there a configuration limit to memory and thread usage on servers
directive is posted by Andy
https://xrootd.slac.stanford.edu/doc/dev51/xrd_config.htm#_Toc49272869

a client waits until the next thread is available
combined with a timeout
times out after 1 minute but retries several times

also number of connections, file descriptors could be limited but this prevents from opening files

Maybe the client could be informed:
we can not handle you right now but at a later point.
This could be implemented.
And would prevent jobs from failing.

But probably the xrootd resources (servers) might have to increase
if the batch load is too high
Soeren did not open a ticket but a ticket in this direction exists

AI: Soeren ==> try xrootd directive

xrd team: implement that client will be informed to trigger correct behaviour

 

11.
fusing xcache and dcotf (Serhat)
disk caching proxy assumes that it has exclusive control over cinfo file
so files can be marked not to be purged
there is no locking envolved
additional piece can not be added
locking creates lots of overhead
cache management plugin can do check on shared file system
then asked xcache server and this can do the updates for you
but updates on cinfo file has to be done via xcache server
director needs to communicate to xcache server to do update
dcotf approach is having a lightweight machine doing the quueing
xcache could be overloaded
this is a solution to the problem
but it is not workable in the way it is architectured today
people just deployed additional xcache servers with a redirector on top to load balance

direct cache directive
does not work in htfs mode
feature only for data server
htfs (block wise caching)
if file is fully populated files can be accessed
otherwise we have to wait until it is
the code does not exist yet to redirect to client as soon as the file is fully populated
but intention is there
bridge between redirector and xcache can be created and all problems can be handled via proxy
redirector can look up itself does not need to ask a server
currently only checks if file is in path

plugin redirect access to local file via http in repo
this could maybe used
first rootd and then http

can anything from dcotf setup can be usable in xrootd core ?
with direct cache access most things are already available.
a) client plugin ==> already in client
b) special version of local redirect plugin (either redirects to file or proxy and able to handle request with proxy prefix)
maybe create ticket with code and details ?
==> this would be a good step
but not 100% overlapp
discussion can be started this way

Serhat can create such a ticket
==> will be discussed what can be implemented in xrootd

AI Serhat

access control via ACLs making it world readable is required
bring in a file, set ACLs, and file would have correct ACLs.
how does localredirect plugin handle unix permissions ?

==> has this been answered in the workshop ?


12:
voms credential forwarding

via forward proxy which connects to file server
proxy server and data server uses ssh and can give credentials to backend server
not workable to outside
x509 forwarding
disagreement in community if that is a good thing
3rd party copy does that already
xcache does not do that
because of security issue if certificate is given away
nobody pushes for that becasue x509 will be replaced by token
doing that in the proxy is not a small project
in proxy security vulnerability issues need to be checked
proxy server has x509 itself to access external servers
check if client is allowed then certificate can be replaced by proxy certificate
in principle it is possible but authorisation framework does not have a convenient way to strip off
missing: making it easier to authorise for forwarded requests


15: open discussion time

Marek: file system throddling mechanism
xrd.throddle plugin, look at binaries
libXrdThrottle-5.so
stuck that on top of ofs plugin
tuning might be needed to have effect
concurrency and data rate
(qwait time, system load on unix, data rate is aggregated amount of ...)

AI Marek to try that out

 

2021 or 2022
every other face to face workshop at GSI
restaurant should be done as soon as pandemic allows that

AI Kilian to organise next workshop at GSI as soon as this is possible