我在大部分时候都不喜欢 xargs, 因为不能比较方便的执行稍微复杂一点点的脚本, 比如经常干的一件事是遍历容器找 veth:
for p in $(ps -ef | awk '/init/ {print $2}'); do ln -s /proc/$p/ns/net /var/run/netns/$p; echo $p; ip netns exec $p ip l | grep ^$IDX; rm /var/run/netns/$p; done
这种循环如果用 xargs 来写的只能是:
ps -ef | awk '/init/ {print $2}' | xargs -I{} bash -c 'ln -s /proc/$0/ns/net /var/run/netns/$0; echo $0; ip netns exec $0 ip l | grep ^$IDX; rm /var/run/netns/$0' {}
非常不方便.
但是 xargs 提供了简单的并行方案, 在需要性能的时候可以简单通过 -P 指定核心数来并发运行, 非常感人.
于是今天在一个需要并发提速的场景下, 我踩上了 xargs 的另一个天坑: 无法处理包括 " 在内的特殊字符.
比如我有一个文件里没一行都是 flat JSON:
{"kind":"WorkloadEndpoint","apiVersion":"projectcalico.org/v3","metadata":{"name":"cachecloud--agent--sg2--test--20.shopeemobile.com-yavirt-yavirt-04fbf89eb0d611eaaa3bfe5400dac964","namespace":"cachecloud-agent-sg2-test-20.shopeemobile.com","uid":"a00da22d-14d1-4c17-932d-e244eaef03ac","creationTimestamp":"2020-06-17T20:06:24Z","labels":{"projectcalico.org/namespace":"cachecloud-agent-sg2-test-20.shopeemobile.com","projectcalico.org/orchestrator":"yavirt"}},"spec":{"orchestrator":"yavirt","workload":"yavirt","node":"cachecloud-agent-sg2-test-20.shopeemobile.com","endpoint":"04fbf89eb0d611eaaa3bfe5400dac964","ipNetworks":["10.143.220.38/32"],"profiles":["calico-pool-1"],"interfaceName":"yap04fbf94bb0d6","mac":"52:54:00:37:8a:9b"}}
{"kind":"WorkloadEndpoint","apiVersion":"projectcalico.org/v3","metadata":{"name":"cachecloud--agent--sg2--test--20.shopeemobile.com-yavirt-yavirt-0584726fb33a11eaaa3bfe5400dac964","namespace":"cachecloud-agent-sg2-test-20.shopeemobile.com","uid":"dea810ac-573d-405f-b50e-57d19c3165ab","creationTimestamp":"2020-06-20T21:07:17Z","labels":{"projectcalico.org/namespace":"cachecloud-agent-sg2-test-20.shopeemobile.com","projectcalico.org/orchestrator":"yavirt"}},"spec":{"orchestrator":"yavirt","workload":"yavirt","node":"cachecloud-agent-sg2-test-20.shopeemobile.com","endpoint":"0584726fb33a11eaaa3bfe5400dac964","ipNetworks":["10.143.219.211/32"],"profiles":["calico-pool-1"],"interfaceName":"yap05847314b33a","mac":"52:54:00:fb:e4:42"}}
xargs 读到每一行的时候居然傻逼到把双引号吃掉了:
# cat file | xargs
{kind:WorkloadEndpoint,apiVersion:projectcalico.org/v3,metadata:{name:cachecloud--agent--sg2--test--20.shopeemobile.com-yavirt-yavirt-04fbf89eb0d611eaaa3bfe5400dac964,namespace:cachecloud-agent-sg2-test-20.shopeemobile.com,uid:a00da22d-14d1-4c17-932d-e244eaef03ac,creationTimestamp:2020-06-17T20:06:24Z,labels:{projectcalico.org/namespace:cachecloud-agent-sg2-test-20.shopeemobile.com,projectcalico.org/orchestrator:yavirt}},spec:{orchestrator:yavirt,workload:yavirt,node:cachecloud-agent-sg2-test-20.shopeemobile.com,endpoint:04fbf89eb0d611eaaa3bfe5400dac964,ipNetworks:[10.143.220.38/32],profiles:[calico-pool-1],interfaceName:yap04fbf94bb0d6,mac:52:54:00:37:8a:9b}} {kind:WorkloadEndpoint,apiVersion:projectcalico.org/v3,metadata:{name:cachecloud--agent--sg2--test--20.shopeemobile.com-yavirt-yavirt-0584726fb33a11eaaa3bfe5400dac964,namespace:cachecloud-agent-sg2-test-20.shopeemobile.com,uid:dea810ac-573d-405f-b50e-57d19c3165ab,creationTimestamp:2020-06-20T21:07:17Z,labels:{projectcalico.org/namespace:cachecloud-agent-sg2-test-20.shopeemobile.com,projectcalico.org/orchestrator:yavirt}},spec:{orchestrator:yavirt,workload:yavirt,node:cachecloud-agent-sg2-test-20.shopeemobile.com,endpoint:0584726fb33a11eaaa3bfe5400dac964,ipNetworks:[10.143.219.211/32],profiles:[calico-pool-1],interfaceName:yap05847314b33a,mac:52:54:00:fb:e4:42}}
于是我发现了 GNU/parallel, 最简单的用法是这样的:
cat file | parallel -j10 -q bash -c 'echo -- $0' {}
-j 指定了使用的核心数量, -q 是为了能正确执行 bash -c '';
除了命令行用法之外还可以用 shebang 来执行:
#!/usr/bin/parallel --shebang-wrap /bin/bash
# p.bash
ip=$(echo "$@" | jq ".spec.ipNetworks[0]" -r)
calicoctl ipam show --ip=${ip%%/*}
然后
chmod +x p.bash
cat file | ./p.bash
注意在命令行里用 $0 而在脚本里用 $@.
好用.