
I run a large dataset of company profiles, more than 100,000 rows. Each row has a handful of jsonb columns: services, categories, languages, and so on. They are all arrays. ["SEO", "PPC", "Content"] . Simple. For a few weeks, background scripts wrote to those columns. No errors. The data looked like it was there. Then I noticed values were not rendering on a chunk of records, and containment filters like services @> '["SEO"]' were quietly returning nothing for rows that obviously had services. When I finally dug in, about 150,000 values across ten jsonb columns were corrupted. Not deleted. Worse. They were stored in a shape that looked right but behaved wrong. Here is exactly what happened, why it is so easy to miss, and the one line that caused it. The one line that did the damage I was using postgres.js. The write looked innocent: await sql` update listings set services = ${JSON.stringify(record.services)} where id = ${record.id}`; record.services is ["SEO", "PPC"] . I JSON.stringify it because, well, it is going into a JSON column. That instinct is the bug. postgres.js already serializes parameters for you. When the target column is jsonb and you hand it a JavaScript array, it stores a jsonb array. But if you hand it a string (which is what JSON.stringify returns), it faithfully stores a jsonb string. So this: JSON.stringify(["SEO", "PPC"]) // => '["SEO","PPC"]' (a JS string) does not become a jsonb array. It becomes a jsonb string scalar whose contents happen to be JSON text: "[\"SEO\",\"PPC\"]" Double-encoded. The array got wrapped in quotes and escaped. Why it is so sneaky It throws no error. A jsonb column accepts a string scalar happily. "hello" is valid jsonb, so "[\"SEO\"]" is too. The insert succeeds. It looks correct at a glance. Dump the column and you see array-ish text. If you are not checking the type, it reads like your data. A cast does not save you. I assumed an explicit ::jsonb would fix it: set services = ${JSON.stringify(arr)}::jsonb // still wrong It does not. The driver still passes a string, and casting JSON-text-as-a-string to jsonb gives you a jsonb string scalar, not an array. The only place it surfaces is behavior. jsonb_typeof tells the truth: select services, jsonb_typeof(services)from listings where id = '...'; -- "[\"SEO\",\"PPC\"]" | string <- should be 'array' Because it was a string, .map() in the UI did nothing useful and @> matched nothing. The data was present and inert. Finding all of it The detection query keys on the type, one line per column: select count(*) from listingswhere jsonb_typeof(services) = 'string'; I ran that across every jsonb column. The damage was lopsided: the two columns my scripts wrote most often took the brunt, around 105k and 41k values, and the rest of the jsonb columns shared the remainder. About 150,000 values total, written by several different scripts that all shared the same JSON.stringify habit. The repair The fix is to pull the inner text back out and re-cast it as real jsonb. In Postgres, #>> '{}' extracts a jsonb value as plain text, which for a string scalar gives you the underlying JSON text. Cast that to jsonb and the array comes back: update listingsset services = (services #>> '{}')::jsonbwhere jsonb_typeof(services) = 'string'; Two wrinkles turned a one-liner into a small script. NUL bytes. Some scraped text carried \u0000 escapes. Postgres jsonb cannot store NUL, so the cast failed on those rows. I had to strip it from the inner text first: set services = replace(services #>> '{}', '\u0000', '')::jsonb Double-double-encoding. A few values had been stringified twice by overlapping scripts, so I looped the repair up to three passes, stopping when a pass changed nothing. The whole thing is idempotent and re-runnable, which matters a lot when you are repairing production data. The correct way to write jsonb with postgres.js The lesson is small and absolute: do not JSON.stringify a value destined for a jsonb column. Let the driver serialize it. With postgres.js the explicit, safe way is sql.json() : await sql` update listings set services = ${sql.json(record.services)} where id = ${record.id}`; For dynamic updates, wrap the values the same way: const updates = { services: sql.json(arr), categories: sql.json(cats) };await sql`update listings set ${sql(updates)} where id = ${id}`; sql.json tells the driver "this is JSON, serialize it once." That is the whole fix. What I took away from it A corruption that throws no error and looks correct is the most expensive kind. A crash gets fixed in an hour. This sat in production for a few weeks, quietly hollowing out the most-used columns in the database, because every signal said it was fine except the one I was not checking: the type. Two habits stuck after this: When a value goes into jsonb, serialization belongs to the driver, never to me. The moment I see JSON.stringify near a database write, I treat it as a bug until proven otherwise. Type invariants deserve a test. jsonb_typeof(col) = 'string' on an array column should always be zero. That check now runs in a daily integrity job, so a stray script reintroducing the pattern surfaces the next morning instead of weeks later. FAQ Why did JSON.stringify break a jsonb insert if jsonb is JSON? Because the driver also serializes. postgres.js encodes your parameter as JSON when the column is jsonb. Pass an array, you get a jsonb array. Pass a string (what JSON.stringify returns), you get a jsonb string whose content is JSON text. It gets encoded twice. Does casting with ::jsonb fix it? No. The driver still sends a string, and casting a JSON-text string to jsonb produces a jsonb string scalar, not an array. Do not stringify in the first place, or use your driver's JSON helper. How do I find double-encoded jsonb values? Check the type. On a column that should hold arrays or objects, select count(*) from t where jsonb_typeof(col) = 'string' should be zero. How do I repair them? Extract the inner text with col #>> '{}' and cast it back: set col = (col #>> '{}')::jsonb where jsonb_typeof(col) = 'string' . Strip \u0000 first if present, and loop a few passes in case of repeated encoding. \
View original source — Hacker Noon ↗


