awesome

Welcome to awesome bug tracking system.
Tasklist

FS#818 - Memory leaks on x86_64

Attached to Project: awesome
Opened by Vasily Pupkin (vasily_pupkin) - Wednesday, 15 September 2010, 16:53 GMT
Last edited by Uli Schlachter (psychon) - Friday, 17 September 2010, 13:51 GMT
Task Type Bug Report
Category Core
Status Closed
Assigned To No-one
Operating System Linux
Severity Critical
Priority Normal
Reported Version 3.4.7
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Just crazy memory leaks at x86_64. Looks like it presents in all 3.x tree, but checked in > 3.2. On x86_64 only. Even on default configuration. Couldn't track with valgrind ( this shit crashed :( )

avatar@AliSo ~ % ps axo rsz,command | grep awesome
132188 awesome

avatar@AliSo ~ % cat /proc/$(pidof awesome)/maps | pastebin
http://pastebin.com/FWQzptgN

AliSo ~ # ldd $(which awesome) | awk '{print $3}' | xargs equery b | awk '{print $1}' | sort |uniq | pastebin
http://pastebin.com/DEC59N23
This task depends upon

Closed by  Uli Schlachter (psychon)
Friday, 17 September 2010, 13:51 GMT
Reason for closing:  Fixed
Additional comments about closing:  If there are still some leaks, please open a new bug report with an approriate test case. This leak is gone for good. :)

commit 0a7bec1dbb5717debcced2ed6e9fef301071afaa
Author: Uli Schlachter <psychon@znc.in>
Date: Fri Sep 17 15:45:24 2010 +0200

Correctly unref widget_nodes

While drawing the wibox, the C core builds up a list of widgets and their
associated geometry. This list consists of widget_node_t objects and is
constructed like this (This is from luaA_table2widgets()):

widget_node_t w;
p_clear(&w, 1);
w.widget = luaA_object_ref_item(L, idx, -1);
widget_node_array_append(widgets, w);

After we are done with this list of widget nodes, it is freed via
wibox_widget_node_array_wipe(). However, wibox_widget_node_array_wipe() passed
"&w" to luaA_object_unref_item() while it should have used "w.widget" (which is
what was returned from luaA_object_ref_item()). This made the unref fail and the
wibox was carrying around a reference to this widget forever. This was
effectively a memory leak.

Signed-off-by: Uli Schlachter <psychon@znc.in>
Comment by Uli Schlachter (psychon) - Wednesday, 15 September 2010, 19:42 GMT
Why do you think there is a memory leak? Some hint would be helpful.
Comment by Vasily Pupkin (vasily_pupkin) - Wednesday, 15 September 2010, 19:44 GMT
Because after restart

avatar@AliSo ~ % ps axo rsz,command | grep awesome
14572 awesome

Comment by Uli Schlachter (psychon) - Wednesday, 15 September 2010, 19:53 GMT
I win:

$ ps axo rsz,command | grep awesome
68784 /home/psychon/projects/awesome/awesome
Comment by Vasily Pupkin (vasily_pupkin) - Wednesday, 15 September 2010, 19:54 GMT
facepalm.jpg
Comment by Nick Gladstone (Cosmonaut) - Wednesday, 15 September 2010, 23:31 GMT
I see this sort of thing on my x86_64 machine too:

796824 /usr/bin/awesome
15:32:50 up 27 days, 23:10, 2 users, load average: 0.40, 0.37, 0.33


after reboot:

11100 /usr/bin/awesome
17:26:30 up 1 min, 1 user, load average: 1.01, 0.45, 0.17


CPU:
model name : Intel(R) Xeon(R) CPU W3505 @ 2.53GHz
awesome v3.4.6 (Hooch)
xorg-server 1.8.1.902-1



Instead, on my x86 machine:

49724 /usr/bin/awesome
17:28:55 up 2 days, 18:34, 3 users, load average: 0.45, 0.37, 0.35
Comment by Uli Schlachter (psychon) - Thursday, 16 September 2010, 12:06 GMT
Since I had no clue where to start, I tried coming up with some test case that shows growing memory usage (feel free to do the same).
Here's the result: (Both top and the collectgarbage("collect") call show growing memory usage)

t1 = timer({timeout = 0.4})
t2 = timer({timeout = 30})
t1:connect_signal("timeout", function()
awful.util.spawn("urxvt -e bash -c 'sleep 1.$((${RANDOM}%10))'")
end)
t2:connect_signal("timeout", function()
collectgarbage("collect")
print(collectgarbage("count"))
end)
t1:start()
t2:start()

BTW: I'm seeing this on x86, too.
Comment by Uli Schlachter (psychon) - Thursday, 16 September 2010, 13:03 GMT
Don't ask me how I found (the details are ugly), but it's the tasklist's buttons that is causing these leaks. If you use a tasklist without any buttons, this problem disappears.

In awful.widget.common.common.list_update(), buttons are set up like this: ("data" is a weak-keyed table, "o" is the client in question)

local btn = capi.button { modifiers = b.modifiers, button = b.button }
btn:connect_signal("press", function () b:emit_signal("press", o) end)
btn:connect_signal("release", function () b:emit_signal("release", o) end)
data[o][#data[o] + 1] = btn -- Removing this line makes the leak go away

So data has a strong reference to "btn" and "btn" has a strong reference to the client "o". Due to this strong references, the client isn't garbage-collected despite the weak-keyed table.

Ideas how to fix this?
Comment by Uli Schlachter (psychon) - Thursday, 16 September 2010, 15:21 GMT
Way more minimal test case that shows how this leak happens:

The references look like this:
data -> "button" from callback -> callback function -> "item" table from callback
Because of this strong reference "item" isn't collected and the table is never cleared.

data = setmetatable({}, { __mode = 'k' })
local t1 = timer({timeout = .0001})
local t2 = timer({timeout = 3})
t1:connect_signal("timeout", function()
local item = {}
local btn = button({})
btn:connect_signal("press", function() print(item.foobar) end)
data[item] = btn
end)
t2:connect_signal("timeout", function()
collectgarbage("collect")
local i = 0
for k in pairs(data) do i = i + 1 end
print(i) -- This should be 0 if the leak is gone
end)
t1:start()
t2:start()
Comment by Uli Schlachter (psychon) - Thursday, 16 September 2010, 15:39 GMT
Fixed by using a weak-valued table. I can prove that this solves part of the problem, but my test case still causes growing memory usage. :(

commit 89f05c90ca73b3a67071d45c25e800d36367f82d
Author: Uli Schlachter <psychon@znc.in>
Date: Thu Sep 16 17:28:50 2010 +0200

{tag,task}list: Use a weak-valued table

The data table is used to map objects (clients/tags) to the buttons associated
with them. This is done so that we don't have to re-create the button objects
each time this lists are updated.

The problem was that this weak-keyed table was never cleared, because the value
had a strong reference to the key (via the button's signal):

btn:connect_signal("press", function () b:emit_signal("press", o) end)

"o" is the key in the table and btn is reachable from the value. This prevented
the garbage collection of the key. Using a weak-keyed and weak-valued table
fixes this.

Signed-off-by: Uli Schlachter <psychon@znc.in>
Comment by Nick Gladstone (Cosmonaut) - Thursday, 16 September 2010, 23:00 GMT
Sorry about that. Accidental refresh that I had left up. Nice work Uli.

It's beyond me at this point but I can help test.
Comment by Vasily Pupkin (vasily_pupkin) - Friday, 17 September 2010, 05:48 GMT
One day passed, and:

00d1e000-016a5000 rw-p 00000000 00:00 0 [heap] | 00d1e000-0127c000 rw-p 00000000 00:00 0 [heap]
Size: 9756 kB | Size: 5496 kB
Rss: 9624 kB | Rss: 5420 kB
Pss: 9624 kB | Pss: 5420 kB
Shared_Clean: 0 kB Shared_Clean: 0 kB
Shared_Dirty: 0 kB Shared_Dirty: 0 kB
Private_Clean: 0 kB Private_Clean: 0 kB
Private_Dirty: 9624 kB | Private_Dirty: 5420 kB
Referenced: 9620 kB | Referenced: 5420 kB
Swap: 0 kB <
KernelPageSize: 4 kB <
MMUPageSize: 4 kB <
7f69799d4000-7f6979c55000 rw-p 00000000 00:00 0 <
Size: 2564 kB <
Rss: 2564 kB <
Pss: 2564 kB <
Shared_Clean: 0 kB <
Shared_Dirty: 0 kB <
Private_Clean: 0 kB <
Private_Dirty: 2564 kB <
Referenced: 2564 kB <

I can try to do dumps if anybody knows how understand that lua shit :]
Comment by Uli Schlachter (psychon) - Friday, 17 September 2010, 06:36 GMT
@Vasily Pupkin: Sorry, no idea what this is trying to tell me.
@Cosmonaut: I'll delete that duplicate comment.

There are still some leaks hiding somewhere in there. It's four lines of code that are guilty and I can't figure out why. :(
Comment by Julien Danjou (jd) - Friday, 17 September 2010, 07:45 GMT
Can you tell us which lines?
Comment by Uli Schlachter (psychon) - Friday, 17 September 2010, 08:15 GMT
Oh, sure. It's in awful.widget.common, list_update().
Removing the "connect_signal()" calls makes the leak go away.

if buttons then
-- Use a local variable so that the garbage collector doesn't strike
-- between now and the :buttons() call.
local btns = data[o]
if not btns then
btns = {}
data[o] = btns
for kb, b in ipairs(buttons) do
-- Create a proxy button object: it will receive the real
-- press and release events, and will propagate them the the
-- button object the user provided, but with the object as
-- argument.
local btn = capi.button { modifiers = b.modifiers, button = b.button }
btn:connect_signal("press", function () b:emit_signal("press", o) end)
btn:connect_signal("release", function () b:emit_signal("release", o) end)
btns[#btns + 1] = btn
end
end
ib:buttons(btns)
tb:buttons(btns)
end
Comment by Julien Danjou (jd) - Friday, 17 September 2010, 08:31 GMT
Ok, I've been there years ago with the same issue. I try to recall what's the problem.
Comment by Julien Danjou (jd) - Friday, 17 September 2010, 10:20 GMT
I can't manage to see what's wrong. I think a good test would be to implement a global creation counter on classes.


static int how_many_client = 0;

create_client()
{
how_many_client++;
}

gc_client ()
{
how_many_client--;
}


That should help to know how much awesome object for each classes there is, and if this is the source of the problem.
Comment by Uli Schlachter (psychon) - Friday, 17 September 2010, 10:42 GMT
@jd: Patch attached.
This shows that we are leaking buttons and widgets and that we destroy more images than we create (wtf? There is an integer underflow in there!)
Comment by Uli Schlachter (psychon) - Friday, 17 September 2010, 10:45 GMT
Argh, sorry. New patch attached. (Why do we have two different ways for creating objects? :/ )

So we are leaking buttons, widgets and clients.
Comment by Julien Danjou (jd) - Friday, 17 September 2010, 10:56 GMT
(I think merging a clean version of this patch might be a good idea anyhow.)
Comment by Julien Danjou (jd) - Friday, 17 September 2010, 10:57 GMT
The relationship seems to be:

Widgets -> Buttons -> Clients

So i'd say we're leaking widgets. Why are we leaking widgets ?!
Comment by Uli Schlachter (psychon) - Friday, 17 September 2010, 11:21 GMT
This already causes widgets to be leaked:

t1 = timer({timeout = 0.5})
t2 = timer({timeout = 5})
local w = wibox({})
t1:connect_signal("timeout", function()
--awful.util.spawn("urxvt -e bash -c 'sleep 1.$((${RANDOM}%10))'")
local wi = widget({ type = "imagebox" })
w.widgets = { wi }
end)
local last = awesome.class_stats()
t2:connect_signal("timeout", function()
collectgarbage("collect")
print(collectgarbage("count"))
print("Dumping number of objects:")
local stats = awesome.class_stats()
for k, v in pairs(stats) do
print(k, v, "change:", v - last[k])
end
last = stats
end)
t1:start()
t2:start()

Using a sub-table leaks also:

local l = {}
w.widgets = l
<in the timer callback>
l.foo = { wi }
Comment by Uli Schlachter (psychon) - Friday, 17 September 2010, 11:30 GMT
I just noticed that my last post doesn't show growing memory usage in top and wondered "why isnt out __gc method called?".

@jd:
luaA_table2wtable() replaces the metatable of everything in the table so that our own __index method is called. Could it be that this replaces the metatable on the widget and thus we lose track of the widget?

Loading...