CZ invested in a Chinese junior student, 11 million dollars in seed round, to become an education agent

Core Viewpoint

K12 Track: Visualized Learning is the True Direction

Founder Park: So many institutions are optimistic about you. In your opinion, what core point impressed them?

Kai: I think the first reason is that the direction is correct. The AI education track has great potential and prospects. The education field we are targeting is the American college entrance exams SAT and AP. The target user group is K12 high school students, and we have a very small gap with this user group, basically no generational divide. We have gone through the entire exam preparation cycle and know where the pain points of exams and preparation lie, allowing us to create a product that truly addresses the pain points of this group.

Secondly, the team is very outstanding. James comes from Gemini and was a core engineer in AI engineering and algorithms at Google. I have three previous experiences in education entrepreneurship, starting from my freshman year with educational software, and during my sophomore year, I participated in creating MathGPTPro, which was selected for Qiji Chuangtan, among others. I have experience in successfully building educational products.

The third point is that in the AI education field we are working in, the core is the animation engine, and we are the core developers of VideoTutor, the team that understands the core technology best and can achieve very precise rendering of the animation engine.

The team itself has very good marketing genes and knows how to do communication.

VideoTutor aligns very well with a consensus among mainstream American VCs, called the "Little Genius Team," which refers to the fact that this field is more suitable for young people, coupled with very good engineering skills, and the founder himself has very good insights and experience, with very fast execution. I think this is a reason that all investors can agree on.

VideoTutor debuted on the New York Stock Exchange at YZi Labs EASY Residency Demo Day

Founder Park: What core problem in the education industry does your product aim to solve?

Kai: Current learning products on the market can be categorized into two types: active learning products and passive learning products. Passive learning products, like Byte's Gauth, Chegg, AnswersAi, etc., cover what we call "homework help" scenarios, with a very short learning chain, mainly where students pay for homework answers.

VideoTutor, on the other hand, covers active learning scenarios. We do not need to consider students' learning motivation because they must study and take exams, such as the SAT and AP. In this scenario, there is a large demand for visualized pain points; 80% of the content in the SAT involves functions, calculus, and other knowledge that requires complex image rendering. VideoTutor's animation engine can effectively address this scenario.

Moreover, the average customer price in this field is very high. In the U.S., an average of 2.6 million students take the SAT each year, creating a significant demand for paid services. Offline SAT courses are very expensive, charging by the hour rather than by package, starting at an average of $150 per hour, with most charging around $230. Many students and parents are willing to pay for learning. However, VideoTutor can effectively replace or even substitute teachers' training because, at this stage, AI-generated videos are almost indistinguishable from teacher training content. This way, students can have their own AI personalized exam preparation teacher at the lowest cost.

Founder Park: What was the trigger for you to decide to create this product?

Kai: Actually, before us, there was already a team at Stanford called Gatekeep Ai. They also wanted to do visualized learning. I had already realized the impact of this direction. In my previous entrepreneurial experiences, most educational products were basically just connecting to the GPT API, similar to a ChatGPT wrapper product. But we found that merely relying on text Q&A has its limitations. We can see that businesses like Chegg and Gauth are declining, as a large part of their scenarios has been replaced by ChatGPT, because students can pay $20 to use ChatGPT to solve many homework problems.

Products based on API wrappers and optimization have reached a ceiling.

However, multimodal visual generation has great prospects because there are many visualized learning scenarios in the SAT field. Unfortunately, Gatekeep got off to a good start but did not continue because it was launched a bit early; the foundational model programming capabilities were not mature yet, and GPT-4 had not been released. Additionally, the mathematical animation engine involves rendering and algorithms, which they could not conquer. But our team has mastered all the core development of the animation engine, solving this problem and making video rendering very accurate.

PMF: Strong User Willingness to Pay

Founder Park: After your product was launched, you reached cooperation with several schools. When did you feel, "I did this product right, I found the pain point," and felt you found PMF?

Kai: This can be discussed from three dimensions.

First, from the revenue metrics perspective, so far VideoTutor has received API requests from 1,000 enterprises, including all well-known large educational institutions in the U.S., and even domestic institutions. Additionally, many schools want to purchase services. The willingness of C-end users is more direct; there is a student parent who is also an investor. After experiencing the product, he shared it with all his friends and family for trial, and everyone was willing to pay. Then he somehow got my phone number and texted me wanting to invest in us. C-end users have a very strong willingness to pay.

The second point is from the user demand perspective. Why is one-on-one tutoring in the U.S. so rigid? Because parents believe that one-on-one teaching is effective and are willing to pay for it. Now, multimodal AI technology can achieve a one-on-one teaching effect in a humanized manner, answering questions as they are asked. Moreover, the recorded video lessons from online one-on-one teaching teachers in the U.S. are actually no different from AI-generated videos. This is what I call "demand migration"; the recorded courses that students pay a lot for are no different from what I generate with AI, so why not use AI? It’s cheaper and more effective.

We have received very positive feedback from many students, and many teachers are also willing to promote this product. The early completion rate and usage duration are particularly good. The 200 seed users we have screened are all early accumulations.

The third point is a sense of product taste and sense. When you keep doing it, from the overall progress of the education industry to the core demand points for students and parents to pay, and then to the evolution of the product itself, thinking back, the entire logic is a closed loop. So from these three dimensions, you feel that PMF is already sufficient. The most critical point is that the willingness to pay is very, very strong.

Cooperation with FIZZ

Founder Park: Many users are proactively wanting to pay, and some have contacted you wanting to invest.

Kai: Yes. In the SAT and AP field, the willingness to pay is inherently strong. The average customer price in this field starts at $100 to $200, and offline classes can be even more expensive, possibly reaching $800. In the U.S., 2.6 million students take the SAT, and 37% of them are willing to pay, making it a market with very strong willingness and demand to pay. Our product can achieve very good demand migration.

Founder Park: For SAT candidates, will they trust an AI over a real teacher?

Kai: Currently, AI answers questions at the level of the SAT and AP, and factual errors are unlikely to occur. In this case, why is it better than offline tutors? One reason is that it’s cheaper, and the second is that students can ask questions continuously without worrying about whether the teacher will think they are asking silly questions or become impatient; they can learn anytime, anywhere, 24/7.

Moreover, this market is transferable; after completing the U.S. market, we can also transfer to Canada, the UK’s A-Level exams, etc., where the demand for payment is very high.

Founder Park: How are you considering the payment aspect now?

Kai: We offer monthly subscriptions, and there’s also a pay-per-learning-results model. I believe AI can now achieve payment based on results. We might launch a package, for example, if you pay $799, we guarantee your child will score full marks in SAT math.

Founder Park: But paying based on exam results still depends on the student's initiative, right?

Kai: This may not be feasible for the domestic college entrance exam because there are many assessment points, with thousands of them. However, the SAT only has 62 assessment points, of which 50 are regular points that most students have no problem with, and the remaining 12 points can also be mastered. Unless the student has significant logical issues, there is basically no situation where they cannot learn. Moreover, the efficiency improvement of AI is very evident.

In fact, many online tutors in the U.S. also offer this service; you pay a teacher $1,800, and the teacher tutors the child, with a success rate of nearly 100% because the SAT assessment points are fixed. As long as the student has a normal IQ level, they generally have no issues. But the domestic college entrance exam is different; it cannot be improved in the short term. Additionally, the domestic college entrance exam requires a score gap, with difficult questions, but the SAT does not have absolute difficult questions because it mainly assesses whether you have mastered the knowledge points.

The pay-per-results model is also a model that previous tutoring teachers have already used, which has this prerequisite.

Founder Park: In your pricing, will model costs be a concern? Is it a high proportion?

Kai: The customer price in our field is set very high, starting at $69 per month, and the model costs are currently very low, so it’s not an issue. The education industry is not like the coding field, where everyone is competing on price because coding requires supporting long contexts.

Products for High School Students: Web Version is Most Important

Founder Park: I remember you said last time that your first version prototype took about two months to develop. How did you consider the entire development cycle, such as division of labor, deciding which features to include or exclude?

Kai: The consensus among all team members is that iteration must be fast because speed allows us to quickly obtain early user feedback.

After the first version was released on Twitter, it caused a huge stir and brought in a large number of users. However, many of these users were programmers, investors, or tech enthusiasts, whom we can collectively call "tech early adopters." At that stage, the feedback we received from them was relatively scattered and not very valuable. We still needed to filter out the truly core seed users from such a wide range of users, specifically high-quality high school students, and then obtain useful feedback through consultations.

The core feedback we received was that the precision of video rendering must reach 100%, which is the top priority for optimization. Whether the UI looks good or whether it supports different TTS voice options were features we cut. Returning to the core of the product: we are doing knowledge learning in science and math scenarios, so the precision of graphic rendering is the core.

Founder Park: How did you balance the generation time at that time?

Kai: At that time, the peak generation time was about 6 minutes. The main consideration was that the explanation of ordinary questions and knowledge points should not exceed 6 minutes. However, in subsequent feedback, we found that some students with less learning ability hoped the content could be explained more slowly and in more depth. We realized that duration should not be limited; it should depend more on the user's learning ability.

Founder Park: What is the longest duration now?

Kai: The longest should be within an hour, allowing for continuous questioning. Interaction and real-time generation occur simultaneously, but this feature was added recently; the initial version did not have it.

Founder Park: Were there any features you initially wanted to implement but later found less important and decided not to do?

Kai: For example, an app. We initially considered whether to quickly develop an app, but later found that most students in the U.S. primarily study using laptops or iPads. Most K12 schools in the U.S. provide students with a Chromebook, and computers are highly prevalent; their homework is also completed on computers. High school students generally have a computer, and the proportion of mobile phones in learning scenarios is less than 5%, which is very low.

Founder Park: So if it’s an education or student-oriented product, the web version is the first priority, while the app is less important.

Kai: Yes, we actually already knew this data, having studied in the U.S. for many years. Later, we surveyed 100 students from the early tens of thousands of users, and over 90 of these 100 students had computers, which further confirmed this point.

Founder Park: When you launched the first version, were you also targeting the K12 group?

Kai: Yes, we have been targeting this group since then. We do not consider Gauth as a competitor; we are more focused on exam training scenarios. A large number of high school students in the U.S. choose offline training or online learning platforms, and VideoTutor effectively migrates this demand.

Founder Park: Will K12 be your core user group for at least the next year?

Kai: It should be a core metric within two years.

Using Large Models, But Not Solely Relying on Them

Founder Park: Briefly introduce your current technical implementation plan. VideoTutor indeed performs much better than other video generation models in generating courses and charts, even surprising when many models cannot accurately generate text.

James: The videos we generate contain both text and graphics. The general production process is: we let the large language model generate text and corresponding animation instructions, and then the animation instructions are rendered by our animation engine, ultimately presented in the video.

The text part is relatively simple; we let the large language model generate text and then render it directly. However, the animation part is generated by our own mathematical animation rendering engine. Its advantage lies in the high precision of rendering axes, geometric figures, and other content, which is precisely where our core technology lies.

Currently, the output of large language models is just text; our agent essentially gives the large language model a piece of paper and a pen, allowing it to draw the appropriate teaching animation it imagines. The part that is drawn is entirely our technology.

Founder Park: How is the final composition of the video, including audio and video, handled?

James: Initially, the user will input a prompt, such as "What is the Pythagorean theorem?" In the first step, we let the large language model reason through all the scenes, generally specifying 3 to 5 scenes, depending on the difficulty of the question. Then, the model generates a rough script for each scene. Next, based on the script for each scene, we perform a second reasoning to generate the text, corresponding graphics, and the text for the voiceover. The voiceover text is then synthesized using TTS.

Finally, we stitch all the scenes together to form a complete video.

Founder Park: I understand the first version was like this. Now that the interactive process has been added, has the generation process changed?

James: It has indeed changed. Now, to allow users to see content as quickly as possible, we first generate the first scene for the user to view, while the subsequent scenes continue to render in the background. When users ask questions, we convert their voice into text and then provide this text along with the content of all previous scenes to the large language model for reasoning, allowing it to plan the next teaching scene. The rendering process for subsequent scenes is the same as before.

Founder Park: If a user has a question after one minute, will they directly ask? After you receive the question, do you return the user's question along with the previously discussed content for the model to process? In this process, after the user finishes asking, does the animation continue playing or stop?

James: Our current delay has been reduced from the initial 20-30 seconds to under 5 seconds. In terms of interaction, we will make some transitions so that users do not focus too much on these 5 seconds; the entire process will be relatively smooth. Within 4-5 seconds, they will see content presented based on their question.

The current design is that the AI teacher will say, "Hmm, let me think," and then wipe the board, simulating a real teacher. If you feel something is wrong, I will wipe it and rewrite it for you; this process feels more natural.

Moreover, we are not just passively waiting for users to ask questions; we also conduct quizzes. We will reason based on quiz feedback and user questions. Additionally, we do not have a completely open mic; users need to actively turn on their microphones, which involves an action to open and close.

Founder Park: So based on this mechanism, the longest explanation can last about an hour.

James: To be precise, there are no limits; if they keep asking questions, they can keep going.

Kai: Yes, there are no preset limits. In fact, VideoTutor is pursuing this direction as multimodal AI progresses; we are not creating demand but better meeting existing demand. Look at offline real education; why are American parents willing to pay a lot? Because the American education and training industry is more about one-on-one teaching, starting at $100 per hour. It’s because offline teachers can do guided questioning; I can observe where you struggle and then ask you. VideoTutor also strives to achieve this real teacher teaching effect, allowing every child to have real-time interaction and teaching.

Founder Park: During class, do you require students to turn on their cameras?

Kai: Not really. Whether students turn on their cameras mainly depends on U.S. privacy laws. The product does not design a mandatory feature for this; whether to turn it on depends on the student's willingness. The main interaction is still through questions and voice feedback.

Founder Park: Technically, do you adopt a strategy of combining small models with cloud-based large models, or how does it work?

Kai: It’s a combination. Internally, we have a dataset with over 100,000 video data points. The better data among these will be manually re-annotated and then used to train fine-tuned models. For example, we currently have over 8,000 SAT sample training data. These fine-tuned small models will work with cloud-based general commercial models like Claude and Gemini.

Founder Park: Will using Claude, Gemini, or GPT affect the core performance of the product?

Kai: We mainly focus on the K12 field, and the level of the foundational model is already sufficient. However, to ensure 100% accuracy, we will call two models for cross-checking; if both models agree on an answer, it is unlikely to be wrong. In terms of code generation, we primarily rely on Claude, as it has better coding capabilities.

Founder Park: Where are the current technical bottlenecks in the product? Is it model capability or code generation?

Kai: Model capability is one aspect. Rendering is another; we have already reduced it to under 5 seconds, and with more GPU deployments, it can be faster. Another aspect is long-term memory capability. We need to accumulate long-term learning behavior data for students, knowing which knowledge points they do not understand. For example, if they forget knowledge points learned a month ago, we can remind them again.

James: We have put a lot of effort into rendering time, continuously making technical breakthroughs, from the initial 2 minutes to 1 minute, and now to under 10 seconds. Our ultimate goal is to achieve rendering with basically no delay, so that when a user asks a question, the result comes out immediately after reasoning is completed. This is a challenge our team is currently tackling, but we have found a new direction.

Not Just Looking at Completion Rates, But Final Exam Scores

Founder Park: How do you measure the core metrics of the product at this stage? How do you determine if a video is useful to users?

Kai: The most critical metric is the exam. In the new version, after watching the video, there will be a quiz at the end; if they get it right, it proves they understood it; if they get it wrong, it proves it was not explained clearly.

Learning effectiveness cannot be measured solely by completion rates; some students may understand after watching half. If they can pass a test halfway through, they don’t need to watch the rest. The core metric of our product is how many students improve their scores here.

Founder Park: But their final exam is completed in a different scenario; how do you get the result of whether they passed?

Kai: This brings us to American product culture, where users tend to share spontaneously after achieving good results from using a product. Many students who use VideoTutor and take the SAT afterward come back to share their experiences and scores. We also have them become campus ambassadors for secondary dissemination.

We have a campus ambassador program consisting of 20 high school students. You can see that Mercor was very successful early on, using the typical "user success story" model. Mercor helped many Indian programmers find jobs in the U.S., and then they would contact these users to film a user story about how they found jobs using Mercor. This created excellent word-of-mouth marketing. VideoTutor operates on the same principle; we want more students to achieve excellent results after using the product and then share their experiences as user stories.

Founder Park: What are the main channels for students to share?

Kai: Students mainly share on TikTok, while parents share in Facebook groups.

Founder Park: If we look at a time frame of six months to a year, what is your planned growth strategy for the product?

Kai: I believe fundamentally, VideoTutor is still a C-end user product, and word-of-mouth is very important. Many successful AI applications early on relied on the word-of-mouth of seed users; for example, if designers found it good, it spread. For us, the core metric is how many SAT candidates achieve high scores after using this product and then share it with other children and parents. Parents mainly use Facebook and Instagram, while students use TikTok, and we will disseminate on these platforms. When this consensus of word-of-mouth forms, teachers in schools will naturally become aware. The reason we were known by so many schools early on is that many teachers found it good and recommended it to the school’s procurement heads. Therefore, the most critical aspect is still the word-of-mouth of C-end users, and how many children improve their scores is the key metric.

Founder Park: What is the general status and timeline for the launch of the new version?

Kai: We hope to officially public release it within two months at the earliest. By then, students will be able to achieve real-time answers to their questions with very low latency, and the graphic rendering in science scenarios will achieve 100% accuracy. Of course, we will not cover competition scenarios or complex college knowledge like linear algebra for now; we will focus more on the K12 field.

Founder Park: What are the barriers or moats for VideoTutor now?

Kai: I think there are several points. The first is the data flywheel. Behind the videos is code; good video data generated by users, after secondary annotation, can be used to retrain fine-tuned models. The more data there is, the better the video effect. Another aspect is learning behavior data; we know which knowledge points different students are weak in, allowing us to establish a data flywheel. The more people use it, the better the product understands students. The second is the leading technological advantage, such as the algorithms of the animation engine. Although the algorithm itself is not the core advantage, as we iterate quickly and accumulate more data, the advantage will become more apparent.

The third is branding; VideoTutor has already become a leading brand in the AI education field among parents in North America, and the trust of parents is also an intangible barrier.

Founder Park: What do you expect VideoTutor to grow into in three to five years?

Kai: We hope that in the future, VideoTutor can become an AI teacher for everyone learning science knowledge. We only focus on science. I believe it will surpass Duolingo in the future. Duolingo is a world-class language learning product, but in the STEM science scenario, there has not been a world-class product because science requires a lot of graphic rendering. Now that foundational model technology is ready, I believe the science scenario will give birth to the next "Duolingo."

Hiring, Especially Wanting People from Major Domestic Companies

Founder Park: You have had several entrepreneurial experiences in the past. What were they about?

Kai: I am currently a junior. I started my first entrepreneurial venture with James in my freshman year, raising $200,000 in angel investment. Although that venture failed, I learned valuable lessons: you cannot fall into homogeneous competition. At that time, the app we developed had many similar products on the market, and we had to engage in traffic competition early on, making it difficult to charge.

In my second entrepreneurial experience, I joined another team, MathGPTPro, as a co-founder and stayed for a few months. During that phase, I learned how to look at product metrics, how to build products, and how to expand users. It was also at that time that I concluded that text-based answer-type educational products had reached their limit. Because they are not much different from ChatGPT, and the structured knowledge question banks that tutoring companies like Zuoyebang spent a lot of money to create have also been replaced by the editing capabilities of large models. So in my third entrepreneurial venture, I knew that visualization was an inevitable trend.

Kai Zhao pitching with Sam Altman at Harvard University

Founder Park: In addition to recognizing the limitations of text-based products, how have your past experiences helped you in building VideoTutor in terms of team or other aspects?

Kai: They have helped a lot.

The first point is better judgment of direction and whether the product has a future. I will look at the website traffic and revenue of competing products to judge the evolution direction of the entire product.

The second point is product building; I can better judge the development rhythm of the product, including product design, front-end and back-end integration, and which metrics to monitor.

The third point is team management and organizational culture capability. I have established a more complete management system, including the division of labor for each team member, rewards, and options distribution. Additionally, I have also learned how to raise funds. We completed this round of $10 million financing within 20 days.

Founder Park: How many people are on your team now?

Kai: Six people, and we all live together.

Founder Park: How was the team initially built?

Kai: James and I have already started two businesses together. We both graduated from the same school and developed an app together in our freshman year. In our sophomore year, I started a business with two other people, and we all knew each other. When we realized that this technology could bring a significant product vision, we contacted each other to form a team to work on this product. Everyone was already alumni, including another partner in the team, Nick, who is also my college roommate.

Founder Park: You are also preparing to expand the team. What kind of people are you looking to hire?

Kai: We are mainly looking for back-end, front-end, large language model, and UI/UX talent, preferably with experience. Because we have already passed the trial-and-error stage and entered the rapid building phase of the product, we need experienced people to help us grow.

Founder Park: You need experienced engineers, product managers, and growth leaders to take the product from 1 to 10, and even from 10 to 100.

Kai: Yes, that’s the stage we are in. We expect to expand the team to 9 to 10 people, prioritizing hiring engineers.

This recruitment may take place in China, so it will be a mixed approach of in-person and remote work.

Founder Park: What kind of profile do you hope this person will have?

Kai: We prefer someone who has experience in major companies, such as ByteDance or Meituan. Because ByteDance has a fast-paced and competitive organizational culture that values young people. Those trained at ByteDance have good methodologies and capabilities, and after joining us, they can bring these successful experiences into our team for integrated learning.

We want someone who has fought hard in major domestic companies and has experience with rapid iteration. We have already passed the student entrepreneurship stage and do not need to hire novices; we need to recruit experienced individuals who are not completely "industry veterans." Because industry veterans may have family considerations and cannot be as competitive. So, we are looking for mid-level individuals who are young and can be competitive.

We are willing to offer rich options to outstanding talents. Although we have raised $11 million, the reason we haven’t hired engineers in the U.S. is that we believe the product and engineering capabilities in China are truly excellent. This wave will definitely see a team led by Chinese people create great products that can compete internationally. Many AI applications are currently built by Chinese people, and the engineering capabilities in China are indeed impressive. This is also our advantage, leveraging the strengths of both China and the U.S.

College Students in Silicon Valley Are All Starting AI Businesses

Founder Park: Especially in Silicon Valley, the trend of college students starting businesses is particularly obvious. What kind of state do you see?

Kai: One fact to note is that this round of companies with a valuation of over $10 billion: Mercor, which focuses on AI recruitment, has completed over $300 million in new financing and has a valuation of over $10 billion; Cursor is already a nailed-down $10 billion valuation. Correspondingly, there are also projects like GPTZero, Pika, etc. These are all entrepreneurial projects by college students, especially the founders of Cursor and Mercor, who are both junior dropouts.

This wave of young entrepreneurs has a common characteristic: highly differentiated competition. They focus on extremely narrow fields and do not create generic products. For example, Mercor started by only recruiting Indian programmers.

The second point is the environment. The entire capital environment and underlying innovation in Silicon Valley, like Stanford, YC, and Peter Thiel's fund, support college student entrepreneurship from the earliest stages, regardless of whether you have mature ideas, and they are willing to support you and provide a strong network.

The third point is the quality of these college students. Whether it’s us or these college students coming out of Silicon Valley, they all possess a very brave spirit of adventure and strong learning ability. This spirit of daring to venture is something many domestic students may lack. Because in Silicon Valley, there are many successful cases of peers around you that inspire you, and the capital environment is willing to believe in young people.

For me, I also compared costs and benefits at that time. If I chose to finish college and then look for a job, I might not be able to pay off my family’s study abroad costs and might not have a significant return. But if I choose to start a business, I can learn madly at the youngest age, and my life has infinite possibilities. I have wanted to start a great company since I was a child.

Founder Park: Why can this generation of college students create companies worth billions of dollars today, while in the past, selling for one or two million dollars was considered impressive? Is there an element of the AI boom and bubble in this?

Kai: I think it’s not entirely a bubble. Cursor has $450 million in real revenue, which is very reliable. Behind this is the methodology and cognitive insight of this generation of young teams, which is very critical. Look at these teams; their backgrounds are quite impressive, and they have very good learning abilities.

Cursor early on relied on college student programmers around them, who had a high acceptance of AI and provided strong feedback. The founder himself is a little genius engineer who can deeply understand users and has strong engineering iteration capabilities; they got the product up and running with just four people early on. After they iterated the product well, they formed user word-of-mouth, generated revenue, and investors were afraid of missing the next Mark Zuckerberg, so capital came to support them.

The underlying condition is that many technologies in this wave of AI are new, and young people learn quickly, are practical, reliable, and dare to act, so they have extreme user understanding and super-fast iteration speed to defeat traditional products. For example, before Cursor, GitHub Copilot was doing well, but why didn’t it surpass it? It’s because of user experience and execution speed.

Founder Park: Can we say that because AI is a new technology, many product perceptions also need to be viewed from a new perspective?

Kai: Yes, this younger generation has deeper cognitive insights than the previous generation of entrepreneurs and can get closer to users. The mainstream AI users are now mostly post-2000s; their learning and feedback iteration speed and tolerance are faster than those of the previous generation of entrepreneurs.

Therefore, the speed of cognitive iteration is key. In the mobile internet era, technological iteration was measured in years or quarters, but in the AI era, technological iteration may be measured in days. As a founder, you must learn quickly, and young people can stay up late and are more driven.

Founder Park: Some media have reported that many founders in Silicon Valley have also started working 996. What do you think?

Kai: Some of my white entrepreneur friends who have raised a lot of money also work 996. They, like us, rent a large house and live and work together. I think 996 is more about the environment; now Silicon Valley is a bit like a gold rush, and everyone doesn’t want to fall behind, so they can only compete on product iteration speed, which requires staying up late to iterate quickly. This is an environmental shaping that forces people to do this.

Founder Park: Are there any trends in the choice of tracks for these college students starting businesses in Silicon Valley?

Kai: I think whether we are doing education or others, everyone has a trend of starting businesses within their comfort zones. The comfort zone refers to being sufficiently familiar with the field and users. The founder of Cursor is very knowledgeable about coding, and we are focused on education because we understand this group well enough. Nowadays, young people are more likely to start businesses within their existing cognitive comfort zones, rather than rashly jumping into unfamiliar fields. Because this way, the feedback you get from users is fast and accurate enough.

There is also cognitive accumulation. We have done education three times, and my understanding has been continuously accumulated. These college students are less likely to rashly do things they haven’t done before; they are more focused on how to do it better. They have a new generation of thinking, iterating continuously within their cognitive circles, and are brave enough to create opportunities.

Another point is the brave spirit of adventure; they are less likely to deny themselves due to others’ negativity, possessing an attitude of "I don't care what you think about me," which is very confident. Behind this is a culture of "high-speed experimentation"; I know my product is not ready yet, but I don’t care; I will launch quickly, iterate quickly, and get feedback quickly.

Founder Park: When did this trend start?

Kai: I think it’s a consensus of success. When everyone sees projects like GPTZero growing out of dorms, iterating continuously, and then gaining capital support and user recognition, this rapid trial-and-error and explosive success creates a consensus.

In a nutshell, "Better done than perfect," completing is more important than perfection. Moreover, everyone is not too worried about competition; many founders in Silicon Valley are willing to share their product ideas, not afraid of you copying; they just need to iterate quickly. I think this wave of young people also has very good storytelling abilities; this storytelling is not empty but based on practical truth, combined with their vision for the future.

Founder Park: First, market yourself.

Kai: Yes. I think the underlying concept is the spirit of adventure and extreme confidence. Driven by this, they continuously bravely try and are not afraid to say the wrong thing. They boldly express their product ideas and execute them; if they make mistakes, they can just correct them. This culture of not fearing trial and error has contributed to this wave of college student entrepreneurship and success.

VCs in the U.S. also look at college student projects; YC regularly invests in college student projects each session.

Financing is the Least of VideoTutor's Worries Now

Founder Park: If you could go back to when you first started VideoTutor, what advice would you give yourself? What could have been done better?

Kai: I think it should be to move faster. Also, regarding team composition. The VideoTutor team has gone through multiple rounds of adjustments. If I had known earlier, I would have better assembled the team based on the skills needed for the product. I believe that in entrepreneurship, organizational capability is very crucial. I would spend more time on organizational capability: selecting people, recognizing people, and utilizing good people.

The current team is suitable for growth from 0 to 1, but to make VideoTutor bigger, we still need to bring in more experienced people to contribute their excellent experiences and capabilities to the team, helping the entire team grow together.

Founder Park: In the next six months, what kind of product or technical challenges do you think VideoTutor might encounter?

Kai: I think one challenge is rendering; achieving true zero latency still requires breakthroughs in engineering. The second point is growth; I think it’s about the product's taste, which includes many aspects, such as whether the UI and interaction design are smooth and perfect, whether the functional interactions are bug-free, and whether the visual layout is attractive, etc. These are all tests for us.

James: I think initially we positioned VideoTutor as a visual teaching assistant for all subjects, but later we became very vertical, focusing only on the math field because that is our strongest area. The next key breakthrough may be horizontal expansion. For example, how to bring the advantages of visualization to the humanities scenarios? For instance, explaining "The sun shines on the wheat at noon, and sweat drips onto the soil." This is a point we need to consider in technology moving forward.

Founder Park: Will the founder's background cause difficulties in subsequent expansion?

Kai: Not really. In fact, many large VCs have approached us, like a16z, who do not invest too early but wait until the team shows signs of success before providing support, so they know the investment will not fail. We maintain good relationships with many large VCs.

Financing is the least of VideoTutor's worries; the most concerning issues are around user ecology and the product.