线程池导致内存泄漏

问题排查

先看监控,现象很直接:老年代持续上涨,线程 Waiting 高到 3.6k。

image-20240319183841921

接着 dump 堆内存。

1
jmap -dump:live,format=b,file=heap.bin pid

重要服务先下线再 dump,不然 dump 期间接口可能抖动。容器服务可以先在 Nacos 下线实例再操作。

拿到 dump 文件后,用 IDEA Profiler 或 Eclipse MAT 分析。

先看 Leak Suspects,直接能看到线程对象占用异常。

image-20240319183900143

再看大对象,Thread 数量已经到 1801。

image-20240319183912793

继续顺着 GC 引用看,问题就落到线程池上了。

image-20240319183927534

进容器跑 jstack 看线程快照:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
"pool-663-thread-1" #113206 prio=5 os_prio=0 tid=0x00007f9b902df800 nid=0x1ba27 waiting on condition [0x00007f9ace5a4000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000fa27fba0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Locked ownable synchronizers:
- None

"pool-662-thread-1" #113073 prio=5 os_prio=0 tid=0x00007f9ba8198000 nid=0x1b9a2 waiting on condition [0x00007f9acf3b2000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000fa282298> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

这里已经能看出来:线程池被重复创建了,而且都是单线程池实例,名称一直在涨。

复现

复现方式也很直接,写个测试接口:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
@Autowired
private LauyoThreadPoolHelper lauyoThreadPoolHelper;

@GetMapping("testProblem")
public Response testProblem() {
CompletableFuture.runAsync(() -> {
System.err.println(Thread.currentThread().getName());
}, lauyoThreadPoolHelper.lauyoExpressExecutor());
return Response.ok();
}

@Component
public class LauyoThreadPoolHelper {

final static Integer corePoolSize = 16;
final static Integer maximumPoolSize = 40;
final static Long keepAliveTime = 60L;

@Bean("lauyoExpressExecutor")
public Executor lauyoExpressExecutor() {
ExecutorService lauyoExecutorService = new ThreadPoolExecutor(
corePoolSize,
maximumPoolSize,
keepAliveTime,
TimeUnit.SECONDS,
new LinkedBlockingQueue<>(200),
Executors.defaultThreadFactory(),
new ThreadPoolExecutor.CallerRunsPolicy());
return lauyoExecutorService;
}

}

用 JMeter 压 1000 线程后,线程池数量会很快飙上去。

image-20240319184044809

分析Executors创建线程池方式

原因不复杂:每次调用 lauyoThreadPoolHelper.lauyoExpressExecutor() 都会 new 一个新的线程池,任务跑完又没关,线程就一直挂在那儿等,数量自然线性上涨。

为什么以前用 Executors.newSingleThreadExecutor() 没这么明显,直接看源码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public static ExecutorService newSingleThreadExecutor() {
return new FinalizableDelegatedExecutorService
(new ThreadPoolExecutor(1, 1,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>()));
}

// 原来该方法并不是直接new一个ThreadPoolExecutor对象返回,而是使用了一个代理类进行代理。
// 进一步查看 FinalizableDelegatedExecutorService 源码:
static class FinalizableDelegatedExecutorService
extends DelegatedExecutorService {
FinalizableDelegatedExecutorService(ExecutorService executor) {
super(executor);
}
protected void finalize() {
super.shutdown();
}
}

这个代理类里实现了 finalize(),GC 时会触发关闭线程池。所以以前的问题被兜住了,但这个兜底本身也不值得依赖。

参考:https://blog.csdn.net/weixin_30399055/article/details/97057005

解决问题

处理方式就是把线程池收成单例,不要每次现造。

  1. 使用注入的线程池 Bean
1
2
3
4
5
6
7
8
9
10
@Autowired
private ExecutorService lauyoExpressExecutor;

@GetMapping("testProblem")
public Response testProblem() {
CompletableFuture.runAsync(() -> {
System.err.println(Thread.currentThread().getName());
}, lauyoExpressExecutor);
return Response.ok();
}
  1. 把类改成 @Configuration
1
2
3
4
@Configuration
public class LauyoThreadPoolHelper {

}

@Configuration 会走 CGLIB 代理,@Bean 方法返回值会按单例处理;只写 @Component 不会有这个效果。

具体原因可参考:https://www.jianshu.com/p/7d15118c290d

  1. 直接把线程池改成单例
1
2
3
4
5
6
7
8
9
10
11
12
13
private static final ExecutorService LAUYO_THREAD_POOL_SERVICE = new ThreadPoolExecutor(
corePoolSize,
maximumPoolSize,
keepAliveTime,
TimeUnit.SECONDS,
new LinkedBlockingQueue<>(100),
new LauyoThreadFactory("lauyo-express-pool-"),
new ThreadPoolExecutor.CallerRunsPolicy());

@Bean("lauyoExpressExecutor")
public Executor lauyoExpressExecutor() {
return LAUYO_THREAD_POOL_SERVICE;
}

改完以后线程数还是会涨,但线程名会稳定落在同一个线程池上,不会再无限造池。

image-20240319184100889

1
2
3
4
5
6
7
8
9
10
11
12
13
14
2024-03-12 18:57:40.319 [] DEBUG [http-nio-8082-exec-160] c.i.compass.core.web.log.RRLogFilter     : <<<<<<<<<--------
lauyo-executor-pool-1-thread-8
lauyo-executor-pool-1-thread-5
lauyo-executor-pool-1-thread-2
lauyo-executor-pool-1-thread-6
lauyo-executor-pool-1-thread-14
lauyo-executor-pool-1-thread-7
lauyo-executor-pool-1-thread-16
lauyo-executor-pool-1-thread-3
lauyo-executor-pool-1-thread-15
lauyo-executor-pool-1-thread-1
lauyo-executor-pool-1-thread-4
lauyo-executor-pool-1-thread-13
lauyo-executor-pool-1-thread-9