Refine.ink experience report

Disclaimer: This was written in the hopes of receiving free credits from Refine for posting this on social media. It is my intention that the fact that I'm bothering at all is a positive signal (why go through the trouble for something I didn't find useful?), but it is worth mentioning that there is almost certainly going to be some skew.

I was doomscrolling on LinkedIn (as one does in times of extreme distress) when I came across the service Refine.ink making an offer to generate reviews for technical papers in CS:

Refine.ink founder offers free reviews

Coming fresh off a paper rejection, I figured what the hell. I dropped a review PDF into the free preview and went about my day, expecting, at best, a few paragraphs of AI slop with one or two real bits of useful feedback. At risk of sounding like a cliche, the reality was surprising.

Math mode

The first (and most alarming) step was to see the interface converting my paper into markdown. Looking at the rendering of the result, it seems that it was not entirely successful:

Very poorly transliterated math diagrams Yes, this is an actual screenshot of the re-rendered document.

I also noticed that my inference rules appeared to be getting transformed into \fracs, and I was initially worried that the analyzer would have difficulty parsing the technical details. This particular fear was unfounded; the detailed feedback caught at least one actual typo in my rules, which means it was able to understand them.

Rendered LaTeX errors in the Refine feedback The feedback itself also contained some LaTex errors.

(Meta-)Feedback

The actual feedback was decent, if not something I'd rely on. It is worth noting that I am using this service on a paper draft that has already come back from actual peer reviewers, so it's possible that I'd find the comments more useful if I wasn't already working from that context.

In the interest of not revealing too much about the paper itself when there's still the possibility that we'll be re-submitting it somewhere else, I'm going to keep things fairly high-level, but it would be interesting to do a more detailed breakdown when that's no longer a concern.

The "Overall Feedback" was capital-F Fine, highlighting some deficiencies in the introduction and asking for more detail about the applications of the actual technical result. I was surprised to see the LLM make the connection between our work and some other related work unprompted, even raising the real criticism that we didn't do a good enough job explaining the placement of our work in the bigger-picture (bringing up some actual external context that we didn't mention ourselves!).

The summary did over-prioritize some issues that are relatively minor in the long run (such as asking for more details about a fresh variable store or writing some word salad about "external semantic notions").

The detailed technical feedback was more of a mixed bag. We've already seen rendering errors that made some individual action items difficult to read, and only about half of them were things I'd consider to be meaningful. It missed at least two complaints given by our real reviewers, though it may be the case that these issues can be found beyond the limits of the preview version.

My overall read is that the LLM has just enough context to give useful feedback on the introduction and individual technical sections, but its ability to think about the story of the paper as a whole is somewhat hit-or-miss.

Interestingly, the LLM did not give feedback on the section I'm personally least satisfied with. I wouldn't hazard a guess as to why.

Conclusion

As it stands right now, I find it difficult to justify purchasing individual review credits at their current price (at time of writing, they seem to value one review at give-or-take US\$40). The feedback is genuinely good, arguably better in-context technical advice than I’d get from my labmates but its fixation on questionably important details makes me wary. If I didn't already have real reviews to work off of, I suspect that I'd be spending a lot of time spinning my wheels chasing trivialities.

I am also still very scared of the math mode issues. I appreciate the difficulty of processing PDFs, but the persistent rendering errors makes me wonder how much of the technical content was lost in translation, especially if we were to have more detailed graphical content.

It's likely that the focus issues would be less pronounced in a less obscure sub-field than multi-stage programming. I'd be interested in seeing how it performs on a less theoretical topic, like security.

Verdict: Probably worth checking out, but not quite there yet.