Back

BibTeX Tidy

110 points32 comments12 days agoflamingtempura.github.io
by b21582612 days ago

I'm surprised that no one has mentioned bibtool [1], which is part of a standard TeXLive install. bibtool can also reformat BiBTeX entries and automatically generate keys. I also use it to turn URLs containing DOI to pure DOI using the following rule:

    % Turn DOI links into pure DOI.
    rename.field { url=doi if url = "10\.[0-9.]+/[-._;()/:a-zA-Z0-9]+" }
    rewrite.rule { doi # ".*/\(10\.[0-9.]+\/[-._;()/:a-zA-Z0-9]+\)" # "{\1}" }

[1]: https://github.com/ge-ne/bibtool
by vitorsr12 days ago

I find formatting is usually a nonissue with BibTeX files as entries tend to already be automatically generated from some authoritative source (e.g., CrossRef).

What I do end up using a lot however is nschloe's betterbib [1].

[1] https://github.com/nschloe/betterbib

by beepbooptheory12 days ago

Wow thank you, _this_ is what I've been looking for!

by dm31912 days ago

My .bib files are a mess. This is partly because I tried to order the file by year then author (manually), which of course resulted in some errors. And also because I write some free text in between entries which talk about the papers and have some keywords that I'm able to search for entries in.

I'd really like some sort of citation manager that uses a .bib file as the data format.

by 1hackaday12 days ago

Two programs that do just that are JabRef (https://www.jabref.org/) and Emacs’ BibTeX mode (http://www.jonathanleroux.org/bibtex-mode.html). Both are excellent. Which one to use depends on whether you prefer a GUI or the programmability and text interface of Emacs.

by albertzeyer12 days ago

I do basically the same. And I wrote some scripts around to auto-fix certain things.

E.g. I can change the bib entry name, put a comment `% alias: old-name` above it, and my auto-fix script will go through my tex files and update them accordingly. This is helpful when I found out later that I had a duplicate entry, or when I just want to rename the bib entry name.

It is sometimes a bit ambiguous when there is an Arxiv version from a different year than the proceedings publication. In that case, I use the proceedings publication, with a reference to Arxiv as well, and I set the year to the proceedings publication.

I have a consistent format for pure Arxiv publications, using misc, and not article as Google Scholar would output it. If I have this inconsistent anywhere, my script would auto-fix this to make it consistent. Or if some entries are missing, it would automatically complete them.

by mmc12 days ago

https://bibdesk.sourceforge.io/ for Macs uses .bib natively

by ufo12 days ago

I use KBibTeX.

by jonathanstrange12 days ago

Does it fix incorrect Unicode chars? I've had a huge problem with that recently when for some reason Kbibtex insisted on converting {\"O} into Ö. I believe Jabref fixed it but I've made very bad experiences with Jabref in the past and would prefer not to use it.

by shoeffner12 days ago

What do you mean? I tried placing Ö and {\"O} in the author's name of the example and tidied it:

    Click Tidy to clean up the entries below      
    @Book{sweig42,
      Author =  { Stef{\"O}{n} SwÖig },
      title =  { The impossible book },
      publisher =  { Dead Poet Society},
      year =  1942,
      month =        mar
    }
This is the output:

    Click Tidy to clean up the entries below
    @book{sweig42,
     title        = {The impossible book},
     author       = {Stef{\"O}{n} SwÖig},
     year         = 1942,
     month        = mar,
     publisher    = {Dead Poet Society}
    }
And \"O should be Ö, so I guess I do not really understand what is "incorrect" in your use case.

I know that the Zoteroplugin BetterBibTeX converts Ö to {\"O} when exporting as BibTeX, but keeps it as Ö when exporting as BibLaTeX – maybe Kbibtex has similar options?

edit: It actually "fixes" Ö to {\"O} if you tick "Escape special characters" or supply the command line argument `--escape`, which should be the default according to GitHub.

by jonathanstrange11 days ago

I meant leaving the Ö as is or even introducing it, which is always wrong with Bibtex. I'm not using Zotero but Jabref also fixes it. There was an Ö in an author name and when I manually changed it to {\"O} Kbibtex reverted it back to an Ö! It's easy to fix by switching to XeTeX but some editorial systems don't use it and will make your manuscript fail.

I was just hoping that the tool fixes this problem, too. Maybe in a future version.

by azalemeth12 days ago

This doesn't answer the question, but I think that if you use XeLaTeX you get Unicode support universally "for free", including in bbl files (generated by bibtex, bibtex8 or biber -- which itself is great, even if it has more of a learning cliff than most!)

by skateboardCat12 days ago

IME biber still nags you about unicode in weird ways; i ended up always escaping that characters to avoid breaking the compilation, so YMMV

by thangalin12 days ago

I used JabRef[1], Zotero[2], and ConTeXt[3] to create a reasonably consistent bibliography[4]. See the TeX SE post for details[5].

[1]: https://www.jabref.org

[2]: https://www.zotero.org

[3]: https://www.contextgarden.net

[4]: https://impacts.to/bibliography.pdf

[5]: https://tex.stackexchange.com/a/490043/2148

by michaelhoffman12 days ago

I've started using biblint which is nice although can be pretty aggressive in auto-cleaning:

https://github.com/Kingsford-Group/biblint

by kthxb12 days ago

Is there some linter like this vor LaTeX, too?

by jraph12 days ago

Yes, ChkTeX for instance: https://www.nongnu.org/chktex/

It also detects repetitions and some basic English language stuff if I recall correctly.

I've used it for my PhD manuscript, it's quite useful.

by rsfern12 days ago

ChkTeX from sibling comment looks really cool!

There’s also a set of style linting scripts by Matt Might [0] that I’ve found really helpful in my writing. There’s an emacs minor mode built around them too, writegood-mode [1]

0: https://matt.might.net/articles/shell-scripts-for-passive-vo...

1: http://bnbeckwith.com/code/writegood-mode.html

by mabub2412 days ago

You have to watch out about some of these linting tools, though, in case you get slavishly attached to their prescriptions. Some of these tools insist that you can never use passive voice, for instance, and that can lead to some truly barbaric phrasing in some docs I've read where it reads like the writer bent themselves into a pretzel trying to get to the active voice.

In Matt Might's recommended reading section for that tool he points out Style: The Basics of Clarity and Grace by Joseph Williams, and reading that book and really learning a lot of things it shows will be much more helpful overall for your writing. Its use of contrasting examples is very effective. For instance, it shows how the passive voice can be a very powerful tool for controlling the flow of topics and ideas and actions in a sentence or paragraph. Through the passive voice you can improve the clarity and coherence of a piece of prose. As Matt Might says, it's about making it a conscious decision rather than to enforce all controlling rules.

by rsfern12 days ago

Yeah, I agree with that. For me it’s still a nice syntax highlighting tool to help me identify and think through those decisions.

I use a lot of hedging/weasel words in my first drafts so I find it helpful to have those highlighted for reconsideration

by michaelhoffman12 days ago

My allmytexs script package has a texlint script which runs fotlatexmk, chktex, hunspell, and linkchecker. The next updates will also run biblint.

https://github.com/hoffmangroup/allmytexs/

by aglionby12 days ago

This is great! Especially nice to be able to remove entire fields.

Relatedly, here are a couple of tools to ensure that references are complete (e.g. updating arXiv papers to their published versions, mostly for computer science papers):

- https://github.com/yuchenlin/rebiber (CLI, web interface)

- https://www.cl.cam.ac.uk/~ga384/bibfix.html (only *ACL papers, web interface with diff, disclaimer: mine)

by tpoacher12 days ago

Only tangentially related, but I've done this and it might interest someone else on this thread :)

https://github.com/tpapastylianou/A-tidy-LaTeX-project-templ...

by countmora11 days ago

First I used a VS Code extension for formatting. After getting into my thesis I switched to Zotero[1] to manage my resources. Its extension BetterBibtex[2] can export into a prettyprinted *.bib which refreshes automatically on change. Works on Zoteros subcollections too.

[1]: https://www.zotero.org

[2]: https://retorque.re/zotero-better-bibtex/

by Gimpei12 days ago

On a tangential note, this got me really excited that there was a Stefan Sweig book that I hadn't heard of. Alas, it seems as if "the impossible book" is a fiction.

by lwhsiao12 days ago

On a related note, checkcites is great for cleaning up unused references as well.

by nxpnsv12 days ago

I love it already!

by joeberon12 days ago

I made a really rough script that does a bunch of transformations on bibtex files to have them be ready for use in academic papers:

1. Shorten author lists to 10 authors 2. Abbreviate journal names 3. Remove unneeded IDs and replace with doi links 4. Remove unneeded tags (file, abstract)

https://gist.github.com/joebentley/6e3e4198ea427545738a82021...

by JorgeGT12 days ago

> 2. Abbreviate journal names

This was exactly what I was going to suggest, it would be a great addition! And if someone from Overleaf is listening... ;)

by albertzeyer12 days ago

> Shorten author lists to 10 authors

Normally the Bibtex style would automatically do that for you already, and it is configurable. And it would automatically append "et al" then. The templates of many conferences usually do that for you.

by joeberon11 days ago

The APS (American Physical Society) Bibtex style does not do this, which is the document class I always use (revtex)

by zosoworld12 days ago

It looks really good, might use it on top of JabRef[1] when I snapshot my library for inclusion in a paper. Even so, JabRef already does most of what is advertised here, but it is really useful just by being in the browser.

[1] https://www.jabref.org/