Cleaning and enriching data at scale. Previously: PhD @ UCLA, PreDoc @ Cal, RA @ MIT. datamule.xyz/ github.com/john-friedman/dat…

Los Angeles, CA
Joined May 2024
If you are building in the financial space, and would like to play around with: - insider trading - institutional holdings - xbrl - sec filings I have spent the last year working on an open source (MIT) project to make that easy. There's also an S3 layer for easy data ingest.
You can now track insider trading on Perplexity Finance. We will be adding politician trading shortly.
1
2
8
On one hand, I find this awesome. On the other hand, I find this somewhat disturbing.
2
What.
Complete american cultural victory.
2
3
Feeling mildly frustrated, because I did the consulting mostly to be nice, not to make money.
> Do a bit of consulting for a new university prof > Charged token cost, because tasks were simple on my end > University wants me to fill out a bunch of forms to become a supplier > University requires many levels of auth, that my browser forgets > University needs my personal address > University payment doesn't arrive > University sends ticket with unclear issue > I email said ticket, it bounces
Probably a good idea to 5x costs when consulting for a university. Time to get paid / how many hoops they make you jump through.
2
2
It might be my innate pessimism--which I've been told to rebrand as "conservatism", but it did not cross my mind that this was a useful metric for traction until I interviewed with an accelerator yesterday.
1
I get a lot of emails asking me when/if a dataset is going to be added to datamule. Ten days ago, I thought: "huh, what happens if I ask them for a LOI." Turns out--Yes, that was useful.
Apparently >30k in letter of intents is good.
1
1
John Friedman retweeted
I'm going to make a series of bets on the little guy. To start, we are going to be granting out compute, up to $100k per project, to support new experiments on GCP. If you have an idea for an open source model that you want to explore, I'd like to hear from you.
China has overtaken the US in cumulative open-source AI model downloads:
This is an interesting proposal, so I asked a nuclear engineer about it. 1) Navy reactors use highly enriched uranium. HE vs HALEU. HE is no longer produced, and the navy does not share. 2) The features that make US navy reactors the best in the world are classified. They are not going to share this. 3) Navy reactors are less thermally efficient. Ballpark 18% vs 32%. 4) Shielding requirements for protecting sailors are lower than for civilians. 5) Navy reactors have much lower seismic standards.
Honestly, the US Navy should just become America's nuclear power provider. US Navy Cost: $2 billion for 2 x 400 MW reactors in Ford-class aircraft carrier NuScale: $10 bn for 500 MW reactor Westinghouse: ~$8 bn for 1000MW reactor
1
2
In April, I had a fun chat with a guy building micro nuclear reactors. Due to chain of custody requirements, they build the reactor, then send it to a national lab which puts in the uranium. They also have to use much less enriched uranium (HALEU) than the navy (HE). This slows things down.
2) everyone on a Navy ship is at a certain level of competence, and subject to military discipline. I'm not sure this would replicate for civilian reactors. For example, during peak isis, terrorists were found spying on civilian nuclear reactors operators and their families in I think Belgium? Risk mitigation is a lot different for civilian reactors.
I'm also skeptical about the safety record. 1) I don't believe it is comparable to civilian. For example, if a civilian reactor is leaking pollution, we'd probably detect it. A navy ship, would probably be leaking it in the middle of the ocean or in a port for a brief amount of time. If this is detected, it seems there would be a strong incentive to hush it up.
I'm skeptical about how the costs are calculated. I've been led to believe that one of the major cost of civilian nuclear reactors is cement. I don't see that much cement on these ships. So that makes me think the reactor systems are different? Which brings up two questions: 1) does building the ocean induce cost savings? 2) is the cost for a nuclear reactor on a Navy ship smaller, because you already have like a metal shell?
I find ideas like this appealing, but my intuition says it's probably not so simple. Building for the navy (secure environment, ocean, away from population centers) seems very different.
Honestly, the US Navy should just become America's nuclear power provider. US Navy Cost: $2 billion for 2 x 400 MW reactors in Ford-class aircraft carrier NuScale: $10 bn for 500 MW reactor Westinghouse: ~$8 bn for 1000MW reactor
I think that I've done the math right. So, my guess is either: 1) This works, and there is very little demand for it 2) Something like this already exists, and I just haven't heard of it.
I'm not well versed on caching, having only used it via cloudflare. I'm not sure how it works under the hood / within a network. But it should be possible to effectively hide the cache / scope it to you or your users.
Just realized that I neglected the ec2 + ephemeral storage costs. They're around $3-$15/month. So, marginal.
Other possible features: Compression (needs beefier instance). Assuming factor of 4: - AWS S3: $1,500 PUT + $15 storage - New tool: $.03 PUT + $15 storage. Ballpark 100x cheaper.
Example costs for 10 million files per day with average size of 8kb: - AWS S3: $1,500.00 PUT + $55 storage - New tool: $0.15 PUT + $55 storage So ballpark 30x cheaper.