线程池导致内存泄漏

问题排查

  1. 查询arms的应用详情发现老年代一直在缓慢增长,且线程数Waiting高达3.6k

image-20240319183841921

  1. 使用阿里云在线创建内存快照,或者使用jamp -dump命令
1
jmap -dump:live,format=b,file=heap.bin pid

PS: 重要的服务需要先下线,再去dump,否则此服务期间可能接口异常。比如service服务我们可以把需要dump的容器在nacos上下线,再去dump。

  1. 将下载的dump文件使用idea的Profiler/Eclipse工具分析

查看Leak Suspects,发现存在线程导致的内存泄漏

image-20240319183900143

然后再看大对象,发现Thread对象数高达1801个

image-20240319183912793

查看其GC引用,可看出线程池存在问题

image-20240319183927534

进入容器使用jstack 1查看线程快照信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
"pool-663-thread-1" #113206 prio=5 os_prio=0 tid=0x00007f9b902df800 nid=0x1ba27 waiting on condition [0x00007f9ace5a4000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000fa27fba0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Locked ownable synchronizers:
- None

"pool-662-thread-1" #113073 prio=5 os_prio=0 tid=0x00007f9ba8198000 nid=0x1b9a2 waiting on condition [0x00007f9acf3b2000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000fa282298> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

可发现是线程池在被重复创建的问题,而线程数都是1(pool-线程池号-thread-线程数)

复现

新增一个测试接口

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
@Autowired
private LauyoThreadPoolHelper lauyoThreadPoolHelper;

@GetMapping("testProblem")
public Response testProblem() {
CompletableFuture.runAsync(() -> {
System.err.println(Thread.currentThread().getName());
}, lauyoThreadPoolHelper.lauyoExpressExecutor());
return Response.ok();
}

@Component
public class LauyoThreadPoolHelper {

final static Integer corePoolSize = 16;
final static Integer maximumPoolSize = 40;
final static Long keepAliveTime = 60L;

@Bean("lauyoExpressExecutor")
public Executor lauyoExpressExecutor() {
ExecutorService lauyoExecutorService = new ThreadPoolExecutor(
corePoolSize,
maximumPoolSize,
keepAliveTime,
TimeUnit.SECONDS,
new LinkedBlockingQueue<>(200),
Executors.defaultThreadFactory(),
new ThreadPoolExecutor.CallerRunsPolicy());
return lauyoExecutorService;
}

}

使用jmeter压测1000个线程,如下图所见及服务日志控制台打印,线程池在突然飙增

image-20240319184044809

分析Executors创建线程池方式

原因:程序每次调用lauyoThreadPoolHelper.lauyoExpressExecutor()方法时,均会创建一个新的线程池来执行任务。但在执行任务后并未关闭该线程池,造成线程无法被回收,线程一直处于等待状态。因而线程数会随时间线性上升。

为什么原来使用 Executors.newSingleThreadExecutor() 时未出现这个问题呢?仍然是查看源码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public static ExecutorService newSingleThreadExecutor() {
return new FinalizableDelegatedExecutorService
(new ThreadPoolExecutor(1, 1,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>()));
}

// 原来该方法并不是直接new一个ThreadPoolExecutor对象返回,而是使用了一个代理类进行代理。
// 进一步查看 FinalizableDelegatedExecutorService 源码:
static class FinalizableDelegatedExecutorService
extends DelegatedExecutorService {
FinalizableDelegatedExecutorService(ExecutorService executor) {
super(executor);
}
protected void finalize() {
super.shutdown();
}
}

在这个代理类中,实现了finalize方法,并在finalize方法中关闭线程池。根据finalize的特性,在GC时会调用finalize方法。因此 Executors.newSingleThreadExecutor()在每次垃圾回收时触发未被使用的线程池关闭,所以没有出现线程数持续上升的问题。

参考:https://blog.csdn.net/weixin_30399055/article/details/97057005

解决问题

  1. 使用注入的线程池Bean调用
1
2
3
4
5
6
7
8
9
10
@Autowired
private ExecutorService lauyoExpressExecutor;

@GetMapping("testProblem")
public Response testProblem() {
CompletableFuture.runAsync(() -> {
System.err.println(Thread.currentThread().getName());
}, lauyoExpressExecutor);
return Response.ok();
}
  1. 仅修改为@Configuration
1
2
3
4
@Configuration
public class LauyoThreadPoolHelper {

}

加了@Configuration的类会被CGLIB进行动态代理,不加或者加@Component注解则不会被代理

具体原因可参考:https://www.jianshu.com/p/7d15118c290d

  1. 将线程池改成单例
1
2
3
4
5
6
7
8
9
10
11
12
13
private static final ExecutorService LAUYO_THREAD_POOL_SERVICE = new ThreadPoolExecutor(
corePoolSize,
maximumPoolSize,
keepAliveTime,
TimeUnit.SECONDS,
new LinkedBlockingQueue<>(100),
new LauyoThreadFactory("lauyo-express-pool-"),
new ThreadPoolExecutor.CallerRunsPolicy());

@Bean("lauyoExpressExecutor")
public Executor lauyoExpressExecutor() {
return LAUYO_THREAD_POOL_SERVICE;
}

修改后可见服务线程数会有升高,但都使用的同一个线程池

image-20240319184100889

1
2
3
4
5
6
7
8
9
10
11
12
13
14
2024-03-12 18:57:40.319 [] DEBUG [http-nio-8082-exec-160] c.i.compass.core.web.log.RRLogFilter     : <<<<<<<<<--------
lauyo-executor-pool-1-thread-8
lauyo-executor-pool-1-thread-5
lauyo-executor-pool-1-thread-2
lauyo-executor-pool-1-thread-6
lauyo-executor-pool-1-thread-14
lauyo-executor-pool-1-thread-7
lauyo-executor-pool-1-thread-16
lauyo-executor-pool-1-thread-3
lauyo-executor-pool-1-thread-15
lauyo-executor-pool-1-thread-1
lauyo-executor-pool-1-thread-4
lauyo-executor-pool-1-thread-13
lauyo-executor-pool-1-thread-9